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Abstract  Numerical  Weather  Prediction  (NWP)  is  in  a  period  of  transition.  As  resolutions  increase, 
global  models  are  moving  towards  fully  nonhydrostatic  dynamical  cores,  with  the  local  and  global 
models  using  the  same  governing  equations;  therefore  we  have  reached  a  point  where  it  will  be  nec¬ 
essary  to  use  a  single  model  for  both  applications.  The  new  dynamical  cores  at  the  heart  of  these 
unified  models  are  designed  to  scale  efficiently  on  clusters  with  hundreds  of  thousands  or  even  millions 
of  CPU  cores  and  GPUs.  Operational  and  research  NWP  codes  currently  use  a  wide  range  of  numer¬ 
ical  methods:  finite  differences,  spectral  transform,  finite  volumes  and,  increasingly,  finite/spectral 
elements  and  discontinuous  Galerkin,  which  constitute  element-based  Galerkin  (EBG)  methods.  Due 
to  their  important  role  in  this  transition,  will  EBGs  be  the  dominant  power  behind  NWP  in  the  next 
10  years,  or  will  they  just  be  one  of  many  methods  to  choose  from?  One  decade  after  the  review  of 
numerical  methods  for  atmospheric  modeling  by  Steppeler  et  al.  (2003)  [Review  of  numerical  methods 
for  nonhydrostatic  weather  prediction  models  Meteorol.  Atmos.  Pliys.  82,  2003],  this  review  discusses 
EBG  methods  as  a  viable  numerical  approach  for  the  next-generation  NWP  models.  One  well-known 
weakness  of  EBG  methods  is  the  generation  of  unphysical  oscillations  in  advection-dominated  flows; 
special  attention  is  hence  devoted  to  dissipation-based  stabilization  methods.  Since  EBGs  are  geo¬ 
metrically  flexible  and  allow  both  conforming  and  non-conforming  meshes,  as  well  as  grid  adaptivity, 
this  review  is  concluded  with  a  short  overview  of  how  mesh  generation  and  dynamic  mesh  refinement 
are  becoming  as  important  for  atmospheric  modeling  as  they  have  been  for  engineering  applications 
for  many  years. 
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1  Introduction 


Numerical  Weather  Prediction  (NWP),  which  began  with  the  work  of  Richardson  during  World  War 
I  [250],  remains  one  of  the  most  challenging  problems  in  the  computational  sciences.  The  two  main 
challenges  to  producing  an  accurate  forecast  are  1)  mathematically  modeling  atmospheric  phenom¬ 
ena  over  a  wide  range  of  physical  and  temporal  scales  (e.g.,  turbulence,  radiation,  cloud  formation), 
and  2)  harnessing  the  available  computational  resources  to  evaluate  these  models  in  an  accurate 
and  efficient  manner.  While  the  goal  of  the  first  challenge  is  probably  static  (that  is,  a  comprehen¬ 
sive  mathematical  description  of  the  atmosphere  at  a  given  time),  the  second  challenge  represents 
a  moving  target.  Computational  resources  not  only  expand;  they  change  in  character.  Richardson’s 
original  idea  of  a  "forecasting  factory"  consisting  of  thousands  of  human  computers  assembled  in  an 
amphitheater  was  never  realized;  the  first  NWP  codes  were  implemented  on  mainframe  computers. 
Mainframes  gave  way  to  minicomputers  and  later  vector  supercomputers  such  as  the  Cray  1,  2,  X-MP, 
and  Y-MP.  By  the  mid-90s,  vector  supercomputers  were  replaced  by  massively  parallel  distributed 
systems.  Now,  in  2015,  we  are  seeing  the  proliferation  of  many-core  architectures  (e.g.  GPUs)  and 
hybrid  distributed/shared  memory  architectures  (e.g.  clusters  of  many-core  processors,  heterogeneous 
clusters).  Moreover,  as  models  increase  their  accuracy  by  resolving  more  phenomena  (e.g.  resolving 
non-liydrostatic  effects,  incorporating  more  complex  moisture  parameterizations),  their  appetite  for 
High  Performance  Computing  (HPC)  resources  grow. 

The  modeling  challenge  and  computational  challenge  meet  in  the  choice  of  the  numerical  method 
used  to  discretize  the  underlying  continuum  model(s),  which  are  generally  expressed  as  systems  of 
both  partial  and  ordinary  differential  equations.  The  numerical  model,  as  this  figurative  middle-man, 
must  both  1)  accurately  represent  the  continuum  model,  and  2)  efficiently  utilize  the  hardware  used  to 
implement  the  numerical  method.  Hence,  the  numerical  method  mediates  these  two  grand  challenges 
by  adapting  to  the  hardware;  moreover,  since  NWP  models  may  take  on  the  order  of  100  man-years  to 
develop,  test,  and  deploy,  the  designers  of  the  numerical  method  should  target  their  model  to  future 
HPC  resources.  Just  as  biological  organisms  must  constantly  adapt  to  their  physical  environment, 
numerical  methods  must  adapt  to  their  computational  environment,  competing  for  available  resources. 
A  natural  question  arises:  which  numerical  methods  will  survive  and  flourish,  and  which  will  stagnate, 
decline,  and  perhaps  go  extinct? 

This  question  was  partially  addressed  in  the  review  of  the  numerical  methods  for  non-liydrostatic 
atmospheric  modeling  reported  by  Steppeler  et  al.  [282],  Based  on  some  of  the  questions  posed  in 
[282],  we  concentrate  on  a  class  of  numerical  methods  that  may  emerge  victorious  in  next  generation 
atmospheric  (and  climate)  models:  element-based  Galerkin  methods  (EBGs).  Among  other  questions, 
Steppeler  and  co-workers  asked  whether  the  numerical  error  caused  by  terrain-following  coordinates 
could  be  avoided  by  means  of  ^-coordinate  based  methods  [281;  282];  element-based  Galerkin  methods 
are  a  natural  choice  to  fulfill  this  recommendation.  Furthermore,  they  questioned  the  ability  of  low 
order  methods  to  resolve  certain  phenomena  at  high  resolution  without  affecting  accuracy:  "Experience 
from  current  models  suggests  that  approximations  of  overall  third  order  will  be  adequate."  It  is  shown 
in  this  review  how  things  have  indeed  evolved  towards  the  high  order  approach  that  Steppeler  et  al. 
were  discussing  10  years  ago  and  how  those  schemes  that  in  2003  had  not  been  used  in  operational 
mode  (because  considered  "advanced"  [281]),  are  currently  the  driving  force  behind  the  next  generation 
NWP  models. 

As  discussed  above,  element-based  Galerkin  schemes  today  are  tied  to  their  relationship  with  the 
evolution  of  computer  hardware.  We  will  see  this  in  the  sections  that  follow,  after  giving  a  short 
overview  of  the  current  trends  in  HPC  and  how  atmospheric  models  are  developing  around  this 
paradigm. 
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1.1  Trends  in  High  Performance  Computing 

Twenty-five  years  ago  (1990),  state-of-the-art  HPC  were  the  Cray  supercomputers  (e.g.  Cray  Y-MP). 
These  machines  had  a  small  number  (2  to  8)  of  expensive  custom  vector  processors,  which  perform 
a  single  instruction  on  multiple  data  (SIMD);  all  the  processors  fetched  data  from  a  bank  of  shared 
memory.  This  trend  changed  in  the  1990s  as  commodity  processors  and  memory  became  relatively 
inexpensive;  suddenly,  large  clusters  of  commodity  processors  that  utilized  distributed  memory  ar¬ 
chitectures  became  available.  Unlike  the  vector  machines,  distributed  memory  systems  require  com¬ 
munication  between  independent  processes.  At  the  present  time  (2015)  another  shift  is  occurring  as 
many-core  architectures,  with  a  relatively  small  amount  of  shared  memory,  are  being  coupled  with 
massively  parallel  systems.  These  distributed  memory  systems  eclipsed  the  older  vectorized  machines 
by  the  late  1990s,  and  vectorized  machines  are  no  longer  used  in  HPC. 

Today,  HPC  is  in  the  Petascale  era,  with  core  counts  exceeding  C>(106)  [226]  while  exascale  tech¬ 
nologies  are  rapidly  approaching.  For  instance,  the  largest  cluster  as  of  November  2014  (TopSOO1)  is 
Tianhe-2  with  3.12  million  cores  and  a  maximum  LINPACK  [80]  performance  of  33.8  PetaFLOPS. 
The  next  largest  machine  is  Titan,  a  Cray  XK7  with  560640  cores  and  a  maximum  LINPACK  per¬ 
formance  of  17.59  PetaFLOPS.  To  take  full  advantage  of  the  performance  of  these  architectures,  the 
need  for  specific  characteristics  in  new  models  drove  scientists  from  different  fields  to  go  back  to  the 
design  board  and  start  from  scratch  in  the  construction  of  their  numerical  algorithms  [118].  This  is 
required  by  the  need  for  very  specific  features  that  the  numerical  method  must  have  to  reach  very  high 
levels  of  scalability  on  the  new  machines.  The  next  section  reports  on  most  operational  and  research 
atmospheric  models  developed  until  today  with  special  emphasis  on  how  atmospheric  modelers  are 
moving  towards  numerical  methods  that  have  proved  more  scalable  on  current  and  future  computers. 


1.2  Existing  atmospheric  models  and  NWP  systems 

Table  1  shows  a  non-exhaustive  list  of  atmospheric  models  developed  until  today.  Most  of  the  listed 
codes  are  based  on  the  finite  difference  method.  Except  for  ENDGame  (UK  Met  Office),  the  Nonhydro¬ 
static  Multiscale  Model  core  of  the  NCEP  NAM,  and  EULerian  LAGrangian  (EULAG),  all  FD-based 
codes  are  limited  area  models  (LAM).  Spectral  transform  and  finite  volumes  represent  the  second  ma¬ 
jor  trend.  Codes  based  on  the  spectral  transform  are  common  for  General  Circulation  Models  (GCM) 
only.  High-order  element-based  methods  (spectral  element  method,  SEM,  and  discontinuous  Galerkin, 
DG)  follow,  while  the  finite  element  method  (FEM),  only  used  by  a  handful  of  models,  is  the  least 
common  of  all.  For  reasons  that  will  become  clearer  in  later  sections,  the  temporal  integration  schemes 
that  are  mostly  used  are  the  split-explicit  and  the  semi-implicit  methods. 
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Fig.  1:  Large  Eddy  Simulation  of  the  evolution  of  a  single  cloud  with  the  Nonhydrostatic  Unified  Model  of 
the  Atmosphere  (NUMA).  From  [214],  The  Maya®computer  graphics  software  was  used  for  the  photo-realistic 
rendering  of  the  simulation  (for  more  details  see  http  :  / / anmr.de / cloudwithmaya) . 


1.3  Traditional  approaches:  Finite  Difference  (FD)  and  Spectral  Transform  (ST)  methods 

As  noticeable  front  the  tables  above,  most  operational  NWP  codes  in  use  are  based  on  either  the 
finite  difference  (FD)  method,  or,  in  the  case  of  global  models,  the  spectral  transform  (ST)  method. 
It  is  difficult  to  find  models  using  these  methods  that  scale  optimally  on  massively  parallel  computers 
(ST  methods  due  to  their  all-to-all  communication  requirements  and  FD  due  to  non-compact  stencils 
especially  at  lrigh-order).  This  is  also  true  of  non-compact  (higlr-order)  finite  volume  methods.  In  order 
to  understand  the  strengths  and  weaknesses  of  these  traditional  approaches  and  how  EBGs  address 
some  of  their  shortcomings,  we  briefly  review  the  FD  and  ST  methods  in  this  subsection. 

Limited  area  models  (LAMs)  consider  atmospheric  flows  over  a  subsection  of  the  earth’s  surface. 
Examples  include  mesoscale  models,  which  typically  span  hundreds  of  kilometers  in  the  horizontal,  and 
cloud  resolving  models  (CRMs),  which  span  approximately  up  to  tens  of  kilometers  in  the  horizontal. 
See  an  example  of  a  simulated  single  cloud  in  Fig.  1. 

The  finite  difference  method  (FD)  is  the  method  of  choice  for  LAMs  for  several  reasons.  First, 
it  is  simple  to  implement  on  a  Cartesian  grid,  especially  if  the  curvature  of  the  earth  is  neglected. 
Unlike  EBG  methods,  or  the  finite  volume  method,  grid  generation  is  trivial  and  very  few  ancillary 
data  structures  are  needed.  Second,  it  is  very  efficient  on  a  single  processor,  or  on  a  small  number 
of  processors  within  a  shared  memory  architecture  (e.g.  vector  machines).  Third,  constructing  both 
upwinded  and  higher  order  discretizations  is  relatively  straightforward,  although  increasing  the  order 
of  accuracy  may  hurt  its  scalability  due  to  the  larger  halo  required. 

Global  models  (or  General  Circulation  Models,  GCMs)  solve  the  governing  equations  on  the  whole 
planet,  which  is  usually  approximated  as  a  sphere.  The  reader  is  referred  to  the  2007  paper  by 
Williamson  [321]  for  a  review  of  GCMs.  Many  operational  GCMs  utilize  ST,  where  spherical  harmonics 
are  used  to  represent  both  diagnostic  and  prognostic  variables  on  the  sphere.  Spherical  harmonics  are 
the  natural  basis  functions  to  solve  PDEs  on  a  sphere  since  they  are  the  eigenfunctions  of  the  negative 
Laplacian.  Hence,  great  accuracy  is  achieved  with  a  minimal  number  of  grid  points  on  the  sphere. 
In  order  to  advance  the  dynamical  equations  in  time  using  ST,  it  is  necessary  to  transform  between 
physical  and  spectral  space;  this  spectral  transform  is  evaluated  using  a  combination  of  Fourier  and 
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Legendre  transforms.  We  perform  an  elementary  complexity  analysis  of  the  ST  method  to  illustrate 
a  fundamental  bottleneck  as  the  resolution  of  NWP  models  increases. 

Letting  n  be  the  number  of  grid  points,  Fourier  transforms  are  evaluated  along  the  longitudinal 
(zonal)  direction  with  an  FFT  with  a  cost  O(nlogn);  along  the  latitudinal  (meridional)  direction, 
a  Legendre  transform  is  required  with  a  cost  of  0(n3).  Although  fast  Legendre  methods  exist,  they 
are  not  widely  used  in  NWP  since  they  have  high  cross-over  points.  Therefore  the  cost  of  the  ST 
method  is  0(nlogn  +  n3),  which  scales  adversely  as  n  increases  (e.g.  horizontal  resolution  is  increased). 
For  a  grid  spacing  greater  than  10  km,  the  hydrostatic,  rather  than  non-liydrostatic  equations  are 
the  governing  equations  in  GCMs  (we  will  touch  more  on  the  equation  sets  in  Section  2).  These 
equations  are  solved  via  a  vertical  mode  decomposition  [119]  which  results  in  a  constant-coefficient 
Helmholtz  operator.  Since  spherical  harmonics  are  exact  solutions  to  this  Helmholtz  operator,  no 
matrix  inversion  is  required.  Furthermore,  ST  have  a  very  small  dispersion  error.  ST  models  were 
developed  during  the  era  of  smaller,  shared-memory  machines  which  did  not  require  communicating 
data  across  processors.  As  the  architectures  transitioned  from  shared  to  distributed  memory,  the 
communication  overhead  became  more  important;  the  all-to-all  communication  required  by  both  the 
FFT  and  Legendre  transform  poses  a  barrier  to  scalability  (not  all  distributed-memory  hardware  can 
do  this  operation  effectively).  For  instance,  the  ST-based  model  NOGAPS,  used  by  the  U.  S.  Navy, 
could  not  scale  beyond  150  processes  at  typical  resolutions  [119].  Hence,  ST  methods,  while  both 
highly  accurate  and  efficient  at  small  processor  counts,  cannot  compete  in  the  era  of  hundreds  of 
thousands  (or  millions)  of  processors. 

To  overcome  the  limitations  of  FD  and  ST  in  the  current  era  of  massively  parallel  computers,  EBG 
methods  are  becoming  the  new  trend  in  atmospheric  modeling  for  the  same  reason  they  have  always 
been  popular  in  other  fields  of  computational  mechanics.  This  alternative  is  justified  by  the  proven 
high  parallel  efficiency  of  local  methods  [226;  320;  176;  73].  The  efficiency  of  EBGs  on  large  to  very 
large  machines  is  facilitated  by  their  small  parallel  communication  footprint.  To  understand  this  small 
footprint,  consider  Fig.  2,  where  the  grids  needed  by  a  a)  finite  element  and  by  a  b)  finite  difference 
method  are  compared.  In  Fig.  2-a,  the  grid  consists  of  nine  finite  elements  With  EBG  the  solution 
is  sought  on  an  element-wise  basis  and  each  element  communicates  information  to  the  others  only 
through  its  shared  boundaries  (nodes  in  the  case  of  CG;  faces  in  the  case  of  DG).  When  the  finite 
element  grid  is  partitioned  into  smaller  portions  of  the  global  domain,  the  only  information  that  needs 
to  be  exchanged  among  the  subdomains  of  the  partition  is  that  on  the  boundary  that  each  subdomain 
shares  with  its  neighbors.  In  contrast,  in  Fig.  2-b  the  grid  is  a  classical  structured,  rectangular  finite 
difference  grid  that  here  is  plotted  to  be  a  direct  analogue  (in  terms  of  node  count)  of  the  finite 
element  mesh.  Because  a  finite  difference  stencil  is  such  that  differentiation  on  each  node  in  the 
domain  requires  information  from  a  set  of  adjacent  nodes  that  varies  with  the  order  of  differentiation, 
when  the  domain  is  partitioned,  some  nodes  will  belong  to  two  overlapping  subdomains.  Because  of 
this,  additional  communication  is  necessary.  In  the  case  of  element-based  schemes  communication  is 
naturally  low  by  construction.  The  details  of  EBG  and  which  models  are  based  on  them  are  reviewed 
in  Section  3. 


1.4  A  Roadmap  for  Element-Based  Galerkin  Methods  and  this  Review 

Historically,  finite  element,  spectral  transform,  and  discontinuous  Galerkin  methods  have  been  devel¬ 
oped  in  relative  isolation.  In  the  past  several  decades,  especially  with  the  advent  of  spectral  elements, 
common  threads  were  identified.  The  two  most  important  ideas  are:  1)  decomposing  a  continuous  do¬ 
main  fi  into  a  finite  number  Ne  of  non-overlapping  elements  fie  and  2)  expanding  the  state  variables 
in  N  basis  functions  ipi  within  each  sub-domain  (or  element)  Ne.  In  the  first  operation,  we  express 
the  geometry  in  an  element-wise  fashion;  in  the  second  operation,  we  perform  a  Galerkin  expansion  of 
the  state  variables.  Hence,  the  moniker  Element-based  Galerkin  method.  As  discussed,  EBG  methods 
are  classified  as  either  continuous  (CG)  or  discontinuous  (DG).  Each  of  these  methods  may  be  char- 
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Fig.  2:  Examples  of  the  adjacency  pattern  for  a  finite  element  (a),  and  for  a  node  that  belongs  to  a  finite 
difference  grid  (b).  In  (a)  and  (b)  information  is  exchanged,  respectively,  element-  and  node- wise.  In  (a),  the  only 
nodes  that  allow  information  to  be  shared  between  elements  are  the  shared  nodes  on  the  boundary  of  neighboring 
elements  (blue  dots  on  the  boundary  of  .).  In  (b),  the  cross  made  of  blue  circular  nodes  and  a  central  red  node 
is  the  stencil  of  a  4th-oider  differentiation  performed  on  the  central  node.  How  these  plots  relate  to  parallelization 
is  described  in  the  text. 


acterized  by  the  number  of  elements  Ne  (or  equivalently,  the  element  diameter  h )  and  the  order  of  the 
basis  function  p.  Resolution  may  be  increased  by  increasing  either  Ne  or  p  independently,  allowing 
a  wide  range  of  combinations.  In  the  limit  of  Ne  =  1,  the  spectral  transform  (ST)  can  be  seen  as  an 
EBG  method;  however,  being  ST  a  degenerate  EBG,  in  the  rest  of  the  paper  it  won’t  be  considered 
among  the  EBG  methods.  This  h  —  p  parameter  space  is  mapped  in  Figure  3.  In  the  left  panel  (CG), 
three  numerical  methods  are  displayed:  finite  elements,  spectral  elements,  and  the  spectral  transform 
method.  Since  continuity  is  required  between  elements,  the  lowest  possible  order  p  is  one.  The  finite 
element  method  (FEM)  is  the  special  case  when  p=  1,2,3  basis  functions  are  employed,  while  the 
spectral  transform  method  is  recovered  if  a  single  element  is  used  with  a  very  large  order  p>  1.  In  the 
right  panel,  we  see  three  numerical  methods:  finite  volumes,  DG,  and  the  spectral  transform  method. 
Since  continuity  is  not  required  between  elements,  we  may  use  constant- valued  basis  functions,  which 
is  equivalent  to  cell-averaging:  hence,  we  recover  the  classical  finite  volume  (FV)  method  if  p  =  0.  For 
1  <  p  <  oo  and  Ne  >  1  we  have  DG,  while  for  large  p  and  Ne  =  1,  we  again  recover  the  ST  method. 
Gabersek  et.  al.  [103]  systematically  mapped  out  the  h-p  space  for  SEM.  They  concluded  that  poly¬ 
nomial  order  p  between  5  and  10  with  an  effective  resolution  of  Ax  =  h/p  between  0.5  and  2.0  km  is 
optimal  for  mesoscale  simulations  in  terms  of  both  accuracy  and  efficiency.  To  our  knowledge,  the  h-p 
space  for  global  non-liydrostatic  simulations  has  not  been  explored  yet. 


1.5  Scalability  of  EBG  methods 

In  the  following  we  report  on  some  recent  scalability  results  of  EBG  on  different  systems  and  for 
different  numerical  configurations.  For  a  more  theoretical  discussion  on  Galerkin  scalability,  we  refer 
to  [142;  176]. 
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Fig.  3:  EBGs  are  divided  into  two  classes:  continuous  Galerkin  (CG)  methods,  whose  solutions  are  continuous  with 
bounded  weak  derivative  (H1),  and  discontinuous  Galerkin  (DG)  methods,  whose  solutions  are  square  integrable 
(L2),  but  not  necessarily  continuous.  The  resolution  of  both  CG  and  DG  methods  may  be  characterized  by 
the  polynomial  order  p  of  their  basis  functions  and  the  number  of  elements  Ne  utilized,  or,  equivalently,  by 
the  diameter  h  oc  1  /Ne  of  each  element.  CG:  If  low  order  basis  functions  (p  =  1,2,3)  are  utilized  with  a  large 
number  of  elements,  we  recover  the  classical  Finite  Element  Method.  For  p  >  3  and  a  smaller  number  of  elements 
used,  we  have  the  spectral  element  method  (SEM).  As  p  is  increased,  Ne  may  be  decreased.  In  the  extreme  case 
of  a  single  element  (on  the  sphere)  and  p>  1,  the  ST  method  is  recovered.  If  we  are  considering  problems  in 
Cartesian  geometry,  this  extreme  case  is  generally  termed  "spectral"  or  "pseudo-spectral".  DG:  Since  DG  admits 
discontinuous  solutions,  a  constant  basis  function  p  =  0  is  admissible,  yielding  the  classical  finite  volume  (FV) 
method.  As  p  is  increased  and  Ne  decreases,  we  enter  the  arena  of  DG  methods.  As  with  CG,  if  a  single  element 
is  utilized,  the  ST  method  is  recovered.  In  this  extreme  case,  the  solution  becomes  continuous. 


1.5.1  Scalability  for  (horizontally)  explicit  time  integration 

In  global  atmospheric  simulations  the  vertical  resolution  is  usually  much  finer  than  the  horizontal. 
This  leads  to  a  much  smaller  time  scale  for  vertical  processes  than  for  horizontal  motion.  For  this 
reason,  it  is  often  more  efficient  to  solve  the  fast  processes  in  the  vertical  direction  implicitly  while 
using  explicit  time  integration  in  the  horizontal  direction,  or  liorizontally-explicit,  vertically-implicit 
(HEVI).  If  a  2D  domain  decomposition  strategy  is  adopted  where  all  the  elements  in  a  vertical  column 
are  maintained  on  a  single  processor,  HEVI  does  not  incur  any  additional  communication.  A  recent 
result  for  this  strategy  with  the  Nonhydrostatic  Unified  Model  of  the  Atmosphere  (NUMA)  [176;  117] 
is  shown  in  Fig.  4.  This  figure  shows  that  NUMA  achieves  weak  scaling  up  to  777,600  cores  and  strong 
scaling  to  about  40,000  cores;  moreover,  the  last  blue  data  point  in  this  figure  indicates  that  NUMA 
scales  in  this  fashion  to  the  limit  of  one  horizontal  element  per  core. 
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Fig.  4:  Scalability  study  with  the  atmospheric  model  NUMA  for  the  baroclinic  instability  test  case  [160]  for  three 
different  horizontal  resolutions  of  25.0km,  12.5km  and  2.78km  (given  in  the  legend).  This  scalability  study  was 
performed  on  the  Blue  Gene  Mira  of  the  Argonne  National  Lab.  The  number  next  to  each  data  point  shows  the 
average  number  of  elements  per  core.  These  simulations  use  a  cubed  sphere  mesh  generated  by  the  function  library 
p4est  [42].  All  simulations  use  6  elements  in  the  vertical  direction  with  HEVI  time  integration  and  a  fifth-order 
CG  method. 


One  important  factor  that  contributed  to  the  excellent  speedup  shown  in  Fig.  4  is  that  the  amount 
of  work  on  each  core  needs  significantly  more  runtime  than  the  time  spent  in  communicating  the  data 
among  neighboring  cores.  This  becomes  more  difficult  when  fully  explicit  time  integration  is  used  and 
when  the  spatial  discretization  order  is  reduced  (Fig.  5). 


1.5.2  Scalability  for  fully  implicit  time  integration 

Scalability  studies  with  the  model  Alya  [142;  308]  and  fully  implicit  time  integration  on  different 
machines  are  shown  in  Fig.  6. 

Alya  is  an  unstructured  finite  element  code.  The  mesh  partitioning  therefore  relies  on  the  ele¬ 
ment  graph,  whose  complexity  depends  on  the  geometry  considered.  Libraries  such  as  ZOLTAN  [28], 
SCOTCH  [49],  or  METIS  [174],  which  are  based  on  graph  partitioning  algorithms,  may  be  used  to 
decompose  an  EBG  mesh.  Just  like  NUMA,  Alya  does  not  require  halos  and  the  information  exchange 
between  neighbors  is  carried  out  on  the  interface  nodes,  that  is,  the  nodes  shared  by  different  subdo¬ 
mains.  From  the  parallelization  point  of  view,  the  load  balance  and  the  communication  scheduling  for 
these  two  codes  depend  on  the  quality  of  the  partition. 
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Fig.  5:  Scalability  study  with  NUMA  using  a  ID  semi-implicit  (HEVI)  simulation  of  a  3D  rising  thermal  bubble 
in  a  1  km3  cubed  domain  for  polynomial  degrees  4  and  8  (see  legend),  using  323  elements.  The  average  number 
of  elements  per  core  is  given  by  the  numbers  next  to  each  data  point.  This  scalability  study  was  performed  on 
the  Blue  Gene  Vesta  of  the  Argonne  National  Lab. 


Several  iterative  solvers  are  available,  and  the  selection  depends  on  the  physical  problem  con¬ 
sidered.  The  incompressible  Navier-Stokes  equations  require  the  solution  of  the  momentum  equation 
and  the  pressure  equation  [140].  For  the  first  algebraic  system,  the  GMRES  method  with  a  simple 
diagonal  preconditioning  is  efficient  in  most  of  the  cases,  and  few  iterations  are  required  to  obtain 
convergence.  For  the  pressure  equation,  a  deflated  conjugate  gradient  method  [203]  is  used  together 
with  linelet  preconditioning  [277],  which  is  very  efficient  in  the  presence  of  boundary  layers.  The  four 
scalabilities  presented  in  Fig.  6  were  obtained  for  the  Navier-Stokes  equations.  The  last  one  represents 
the  combustion  in  a  kiln,  which  consists  in  solving  the  low  Mach  Navier-Stokes  equations  together 
with  a  temperature  equation  and  chemical  reactions. 


1.6  Plan  of  the  paper 

The  rest  of  the  review  is  organized  as  follows:  in  Section  2  we  give  an  overview  of  the  different  equation 
sets  used  in  the  dynamical  cores  of  atmospheric  models.  Element-based  Galerkin  methods  within  the 
context  of  NWP  are  introduced  in  Section  3.  Since  EBG  methods  may  produces  unphysical  extrema 
(especially  high-order  EBGs),  stabilization/hltering  is  often  required:  a  review  of  some  stabilization 
methods  follows  in  Section  4.  Section  5  explores  accurate  grid  generation  within  high  resolution 
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Jugene  -  Blue  Gene/P 
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Curie  -  BullX 
France 


Blue  Waters  -  Cray  XE6 
USA 


Average  *  elements  per  CPU  Average  *  elements  per  CPU  Average  4  elements  per  CPU  Average  4  elements  per  CPU 

365k  91k  45k  730k  163k  92k  200k  50k  25k  129k  64k  42k 


Fig.  6:  Scalability  study  for  a  fully  implicit  simulation  using  Alya  [142;  308]  on  four  different  HPC  architectures. 


simulations  (e.g.  well  resolved  topography),  along  with  static  and  dynamic  grid  adaptivity.  A  summary 
is  reported  in  Section  6. 
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2  Equation  sets  for  atmospheric  modeling 


For  typical  atmospheric  scales  (1  nr  to  1000+  km),  the  earth’s  atmosphere  can  be  treated  as  a  contin¬ 
uum  governed  by  the  compressible  Navier-Stokes  equations  with  body  forces  to  model  the  effects  of 
gravity  and  the  Earth’s  rotation  (i.e.  Coriolis  force).  Although  the  gravitational  force  varies  with  both 
altitude  and  lattitude,  these  minor  perturbations  are  generally  neglected.  In  this  section,  we  neglect 
the  effects  of  moisture,  solar  radiation,  and  heat  flux  from  the  ground  and  consider  the  dry  dynamics 
of  the  atmosphere.  Let  Q  be  a  three-dimensional  domain  in  a  rotating  reference  system  x  and  let 
t  >  0  be  time.  The  state  of  dry,  stratified  air  can  be  described  by  density,  p,  pressure,  p ,  absolute 
temperature,  T,  and  velocity  field,  u, 


dpu 

~dt 


+  V  •  (pu  ®  u)  +  Vp  -S/-cr  = 


—2p(u  x  u)  —  pg, 


dp 

dt 


+  V  •  (pu)  =  0, 


(la) 

(lb) 


^  +  V-((£  +  p)u)-V-(^VT  +  u.<r)  =0, 


(lc) 


where  ui  is  the  rotational  velocity  of  the  Earth,  a  is  the  viscous  stress  tensor,  g  is  the  sum  of  true 
gravity  and  the  centrifugal  force,  and  the  total  energy,  E ,  is  given  by 


E  =  pcvT  +  -pu  •  u  +  pgr , 


(2) 


where  r  is  the  radial  distance  from  a  fixed  reference  point  at  the  center  of  the  earth.  Ecp  (2)  consists  of 
three  components:  internal  energy,  kinetic  energy,  and  gravitational  potential  energy.  For  a  Newtonian 
fluid  with  dynamic  viscosity  p,  the  viscous  stress  tensor  is  given  by 


a  =  p 


(Vu+  (Vu)T) 


i<v-u)i 


(3) 


where  2/3  is  a  constant  derived  from  the  Stokes  hypothesis  and  t  is  the  vector  transpose  [232].  The 
system  (1)  of  five  conservation  laws  in  six  unknowns  is  closed  by  the  equation  of  state  (ideal  gas  law) 
for  pressure: 


p  = 


R 

Cy 


1 


-pu  ■  u  —  pgr 


(4) 


We  note  that  Ecp  (3)  does  not  incorporate  any  effects  of  turbulent  dissipation.  Since  the  Kolmogorov 
length  scale  of  a  typical  atmospheric  problem  is  on  the  order  of  0.1  mm,  direct  numerical  simulation 
(DNS)  of  atmospheric  motion  is  not  possible  with  the  current  computational  resources.  To  properly 
account  for  unresolved  turbulent  motion  (e.g.  turbulent  dissipation),  a  sub  grid  scale  model  or  turbu¬ 
lence  closure  scheme  should  be  included.  To  simplify  the  treatment  of  the  most  commonly  used  sets 
of  equations  and  of  the  numerical  methods  discussed  below,  we  will  neglect  viscosity  and  restrict  our 
analysis  to  the  Euler  equations  (p  =  0)  and  various  approximations  utilized  in  atmospheric  modeling; 
however,  we  will  revisit  viscous  effects  in  the  context  of  stabilization  in  Section  4. 

Atmospheric  models  can  be  broadly  classified  into  three  groups:  1)  non-hydrostatic  models  based 
on  the  compressible  Euler  equations,  2)  hydrostatic  models,  which  assume  a  vertical  momentum 
balance  between  gravity  and  the  vertical  pressure  gradient  but  include  the  vertical  stratification  of 
the  atmosphere,  and  3)  sound-proof  models.  We  also  mention  the  shallow  water  model,  which  neglects 
all  vertical  motion  by  assuming  each  column  of  air  moves  as  a  rigid  body,  as  shallow  water  models 
are  often  developed  to  test  the  horizontal  propagation  of  features  by  numerical  methods  before  they 
are  applied  to  the  solution  of  the  equations  for  a  full  atmosphere. 
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The  set  of  governing  equations  constitutes  the  dynamical  core  of  the  model.  In  the  following 
sections,  we  broadly  survey  the  equation  sets  commonly  used  in  existing  operational  and  research 
atmospheric  models,  beginning  with  non-hydrostatic  models  and  ending  with  shallow  water  mod¬ 
els.  Please,  also  consult  [322].  For  a  discussion  of  the  interplay  between  the  choice  of  equation  set 
and  numerical  challenges  encountered,  consult  [299].  For  an  analysis  of  the  differences  between  non¬ 
hydrostatic,  hydrostatic,  shallow  atmosphere  (note,  not  to  be  confused  with  the  shallow  water  model) 
and  deep  atmosphere  approximations,  see  [317]. 


2.1  Non-hydrostatic  Models 


The  fully  compressible  Euler  and  Navier-Stokes  equations  model  all  the  scales  and  motions  of  the 
atmosphere.  In  NWP  the  equations  expressed  in  the  form  of  (1)  are  very  often  algebraically  manip¬ 
ulated  via  the  introduction  of  derived  physical  variables  to  help  the  physical  interpretation  of  the 
atmosphere.  For  example,  let  us  introduce  potential  temperature,  9,  which  is  the  temperature  that 
an  air  parcel  would  have  if  it  were  expanded  or  compressed  adiabatically  to  a  standard  pressure  po  = 
1000  liPa  [138] .  Potential  temperature  is  related  to  p  and  T  via  the  expression  9  =  T/n,  where 

7T  =  ( p/Po)R/Cp  (5) 


is  a  normalized  pressure  (known  as  Exner  pressure)  with  respect  to  a  reference  pressure  po-  Given  9, 
the  following  conservation  laws  for  (p,u,(?)T  are  obtained  by  transforming  Eq.  (lc): 


dpu 

~df 


+  V  •  (pu ®  u)  +  Vp  =  -pg  -  2p  (u  x  u) , 

(6a) 

ft  +  V'(PU)=0’ 

(6b) 

^  +  V  •  (pdu)  =  0. 

(6c) 

The  equation  of  state  for  pressure  (4) 


P  =  Po 


(7) 


completes  the  system.  Numerical  methods  for  the  solution  of  this  system  can  be  easily  constructed  to 
conserve  mass  and  momentum.  It  is,  however,  much  more  difficult  to  formulate  numerical  schemes  that 
also  conserve  energy.  However,  for  an  adiabatic  and  reversible  system,  entropy  is  conserved.  Entropy 
s  may  be  related  to  potential  temperature  9  via  the  relation 


s  =  cp  In  9  +  constant , 

thereby  justifying  the  use  of  9  rather  than  E. 

The  ARW-WRF  model  [271]  is  based  on  this  set,  and  so  are  the  finite  volume  model  described 
in  [2],  the  Met  Office  ENDGame  [323],  and  the  German  LM  model  [62].  The  Nonhydrostatic  Unified 
Model  of  the  Atmosphere  (NUMA)  [176;  117]  developed  at  the  Naval  Postgraduate  School  is  designed 
around  two  different  sets,  including  (6).  NUMA  is  the  underlying  dynamical  core  of  the  next  generation 
NWP  model  of  the  U.S.  Navy,  NEPTUNE. 

Constructing  the  divergence  of  flux  in  Eq.  (6)  requires  some  additional  computational  overhead; 
this  overhead  may  be  reduced  by  converting  Eqs.  (6)  to  their  advection  form: 
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ft+V-pu  =  0, 

(8b) 

dd 

(8c) 

-  +  u.W  =  0, 

again,  completed  by  an  equation  of  state  given  by  Eq.  (7).  Numerical  approximations  to  this  set  of 
equations  can  be  constructed  to  conserve  mass,  although  conservation  of  momentum  and  energy  is 
more  difficult  to  obtain.  NUMA  is  designed  to  be  able  to  handle  this  set  as  well,  although  the  flux 
form  (6)  is  the  required  formulation  when  NUMA  is  executed  in  the  discontinuous  Galerkin  mode. 

By  combining  the  definition  of  the  Exner  pressure  (Eq.  (5))  with  the  continuity  equation  in  Eqs. 
(8),  we  obtain: 

(9u 

—  +  u  ■  Vu  + cp0V7T  =  — g  —  2w  x  u,  (9a) 


dn  R  .  . 

—  +U-V-7T - 7rV-u  =  0,  (9b) 

at  cv 

BO 

—  +  u-V6»  =  0,  (9c) 

where  (7 r,u,0)’7~  is  the  vector  of  the  solution  variables  [85;  71].  The  practitioners  who  use  this  set 
justify  it  by  saying  that  it  is  self-contained  because  there  is  no  need  for  a  state  equation.  For  as  much 
as  it  is  evident  that  no  equation  of  state  is  directly  necessary,  we  still  need  to  point  out  that  the 
algebraic  computation  of  p  from  an  equation  of  state  similar  to  (4)  is  here  simply  substituted  by  the 
diagnosis  of  p  and  p  from  7r  or  9 ;  an  operation  that  is  still  necessary  when  it  comes  to  the  analysis 
of  the  forecast.  This  still  contributes  to  the  net  operation  count  to  Eq.  (4).  Eqs.  (9)  do  not  conserve 
mass,  momentum,  and  energy;  yet,  they  are  widely  used  in  operational  NWP  models  such  as  MM5 
developed  at  Penn  State  and  NCAR  [83],  NMM  based  on  the  work  by  Janjic  [166]  at  NCEP,  COAMPS 
[135]  from  the  U.S.  Naval  Research  Laboratory  (NRL),  and  HIRLAM  [256;  257]  by  a  consortium  of 
European  numerical  weather  services. 


2.1.1  Sound  waves:  anelastic  models  and  implicit  time  integration 

All  of  the  non-hydrostatic  equation  sets  described  in  the  previous  section  are  compressible;  therefore, 
they  all  contain  sound  waves  which  propagate  at  a  very  high  speed  (approximately  300  m/s)  relative 
to  meteorologically  relevant  phenomena.  If  these  equations  are  discretized  explicitly,  a  small  time-step 
must  be  utilized  in  order  to  satisfy  the  stability  criterion  based  on  the  Courant-Friedrichs-Lewy  (CFL) 
condition  [67],  thereby  increasing  the  computational  cost  of  the  model.  Since  the  vertical  grid-spacing 
is  typically  much  smaller  than  the  horizontal  grid  spacing,  the  vertically  propagating  sound  waves 
are  the  most  problematic  aspect  in  these  equation  sets.  To  bypass  the  small  time-step  requirement 
of  the  models  that  support  sound-waves,  yet  preserve  the  remaining  dynamics,  the  anelastic  model 
was  introduced  in  1953  by  Batchelor  [15]  and  later  analyzed  in  [233;  202;  10],  where  the  continuity 
equation  in  Eqs.  (6)  and  (8)  is  replaced  by 


V  •  (p(z)u)  =  0.  (10) 

I11  (10),  density  p  is  only  a  function  of  height.  An  improved  soundproof  approximation  is  the  pseudo- 
incompressible  model  proposed  by  Durran  [84;  86],  where  the  time  dependence  of  density  is  ac¬ 
counted  for,  although  density  is  a  function  of  a  time-invariant  reference  state  pressure  and  time- 
dependent  potential  temperature.  All  these  models  are  able  to  filter  sound  from  the  original  com¬ 
pressible  Euler /Navier- Stokes  equations,  but  still  account  for  the  most  important  waves  (e.g.  Rossby) 
in  the  solution  of  the  atmospheric  motion.  The  interested  reader  may  consult  the  review  [179]  for  more 
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on  the  validity  of  these  approximations.  A  step  towards  the  blending  of  soundproof  and  compressible 
Euler  equations  was  recently  investigated  in  [20]. 

The  soundproof  approximation  of  the  governing  equations  is  one  option  to  the  necessary  filtering 
of  sound  waves.  The  fully  compressible,  non-hydrostatic  equations  can,  on  the  other  hand,  be  approx¬ 
imated  in  time  via  a  semi- implicit  scheme  as  done  in  [292;  71;  291].  Because  the  fast  waves  are  treated 
implicitly  in  a  semi-implicit  approximation,  the  time  step  is  only  limited  by  the  non-linear  advective 
part  of  the  equation;  hence,  the  time-step  is  limited  by  the  advective  CFL  condition  At  <  (72\a;/||u||, 
which  is  far  less  stringent  than  the  CFL  condition  At  <  CZla;/(||u||  +cs),  where  C  is  a  constant  of 
order  one  and  cs  is  the  speed  of  sound. 

Semi-implicit  methods  are  closely  related  to  implicit-explicit  (IMEX)  methods  [189].  Semi-implicit 
is,  for  the  most  cases,  tied  to  the  combination  explicit  leap-frog  +  implicit  Crank-Nicholson,  whereas 
IMEX  can  be  viewed  as  a  generalization  that  allows  for  different  time-differencing  schemes,  as  first 
proposed  in  2009  by  Restelli  and  Giraldo  [249]  to  solve  the  fully  compressible  Navier-Stokes  of  non¬ 
hydrostatic  stratified  flows  approximated  in  space  by  DG.  Moreover,  the  IMEX+DG  by  Restelli  and 
Giraldo  is  a  general  method  applicable  to  different  Mach  regimes  for  viscous  and  inviscid  flows.  In 
2004,  Dolejsi  and  Feistauer  [76]  coupled  DG  with  an  implicit-explicit  time  discretization  scheme  to 
solve  the  Euler  equations  of  fully  compressible  flows.  In  that  paper,  the  numerical  flux  term  was  first 
discretized  in  a  fully  implicit  manner;  then,  the  implicit  numerical  flux  was  linearized  via  a  Taylor 
expansion,  resulting  in  a  linear  system  of  equations  which  is  solved  via  a  sparse  iterative  solver,  as 
opposed  to  a  more  expensive  non-linear  solver  (e.g.  Newton-Krylov)  as  required  by  a  fully  implicit 
discretization.  More  recent  work  on  IMEX  methods  includes  [87],  which  utilizes  Adams  and  backward 
difference  methods,  and  [314],  which  takes  a  horizontally-explicit  vertically- implicit  (HEVI)  approach. 

In  2013,  an  IMEX  version  of  the  Nonhydrostatic  Unified  Model  of  the  Atmosphere  NUMA  was 
introduced  in  [117].  Both  a  3D  IMEX  scheme  which  discretized  the  horizontal  and  vertical  (linear) 
operators  implicitly,  and  a  ID  IMEX  which  only  discretized  the  vertical  operators  implicitly  (HEVI), 
were  derived  and  compared  using  both  second-order  backward  difference  formulas  and  higher-order 
(up  to  order  4)  implicit  Runge-Kutta  methods. 

As  mentioned  earlier,  3D-IMEX  methods  require  the  solution  of  a  linear  system  of  equations. 
This  linear  solve  may  be  poorly  conditioned  (especially  for  large  Courant  numbers)  and  hence  com¬ 
putationally  expensive.  An  alternative  method  that  does  not  require  a  linear  solve  is  the  split-explicit 
method  [293] .  The  split-explicit  approach  relies  on  sub-time  stepping  to  treat  the  terms  that  represent 
sound  and  gravity  waves  within  one  larger  explicit  time-step  for  the  remaining  terms.  This  method  is 
common  in  atmospheric  simulations,  in  spite  of  its  low  accuracy  [318;  319]  and  potential  instabilities. 


2.1.2  N early-hydrostatic  flows 


Dynamics  in  the  atmosphere  are  characterized  by  small  variations  of  thermodynamic  quantities  with 
respect  to  some  background  state  [207;  178]: 


P(x,t)  =  p' (x.,t)  +  p(z) 
P(x,t)=pl(x,t)+p(z) 
@(x,f)  =  O'  +  O(z) 


(11a) 

(lib) 

(11c) 


where  the  primed  and  barred  quantities  represent,  respectively,  the  perturbation  and  the  background 
state  of  p,  p,  and  0.  In  Eq.  (11c),  0  =  p9.  In  typical  atmospheric  simulations,  p'  <C  p,  p'  <C  p  and 
0'  <  0.  If  the  vertical  acceleration  is  zero,  the  vertical  component  of  the  momentum  equation  reduces 
to  the  hydrostatic  balance  given  by  the  following  equation: 


dp 

dz 


=  -gp- 


(12) 
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Given  these  considerations  and  the  analysis  of  nearly-hydrostatic  flows  for  well-balanced  methods 
[30],  the  system  (6)  is  transformed  in  terms  of  perturbation  variables  where  the  Coriolis  term  is 
neglected.  Substituting  Eq.  (11)  into  Eq.  (6)  and  applying  Eq.  (12)  to  the  ^-component  yields 


+  V  •  (pu  ®  u)  +  S/p'  =  -p  g, 

(13a) 

dp' 

i+v>u)=o, 

(13b) 

+ 

<] 

II 

o 

(13c) 

Throughout  this  review,  the  primes  will  be  mostly  omitted  to  simplify  notation. 


2.2  Hydrostatic  vs  non-hydrostatic  models 

Atmospheric  models  can  be  distinguished  as  hydrostatic  and  non-hydrostatic.  If  we  assume  the  vertical 
acceleration  to  be  negligible,  the  vertical  momentum  equation  of  the  hydrostatic  system  reduces  to 
the  diagnostic  equilibrium  equation  (12).  At  every  time-step,  this  time-independent  equation  is  solved 
instead  of  the  full  equation  for  vertical  momentum.  Sound  waves  are  eliminated  in  the  vertical  direction 
[85]  but  not  in  the  horizontal  direction.  Because  the  size  of  the  domain  in  the  horizontal  direction  is 
typically  much  larger  than  the  vertical  depth  of  the  atmosphere  and  the  grid  size  along  x  and  y  may 
be  orders  of  magnitude  larger  than  the  grid  spacing  along  z,  a  much  larger  time-step  may  be  utilized. 

The  hydrostatic  approximation  has  been  a  central  approximation  of  NWP  for  the  past  four  decades 
and  is  used  in  the  Hydrostatic  Primitive  Equations  (HPE)  discussed  in  the  next  section.  This  approxi¬ 
mation  is  valid  for  horizontal  grid  spacing  larger  than  10  km  [165;  299].  The  hydrostatic  approximation 
is  still  appropriate  to  simulate  synoptic  scale  phenomena  where  the  vertical  acceleration  can  be  ne¬ 
glected,  but  is  no  longer  considered  in  any  mesoscale  simulation.  With  the  availability  of  more  powerful 
computers,  the  non-hydrostatic  formulations  described  above  are  standard  for  mesoscale  NWP.  The 
reader  should  refer  to  [167;  21;  29;  106;  124;  135;  166;  271;  324;  118]  for  more  on  the  evolution  of 
non-hydrostatic  models. 


2.2.1  Hydrostatic  Primitive  Equations 

The  hydrostatic  primitive  equations  (HPE)  govern  the  dynamics  in  synoptic  scale  (e.g.  global-scale) 
meteorology  and  are  valid  for  horizontal  resolution  coarser  than  10  km.  The  HPE  are  expressed  in 
so-called  a  coordinates  which  allow  the  boundary  condition  on  the  ground  to  be  easily  applied,  even 
in  the  presence  of  complex  orography.  The  HPE  are  derived  by  first  expressing  the  compressible  Euler 
equations  in  terms  of  pressure,  velocity,  and  potential  temperature.  A  hydrostatic  balance  is  applied 
in  the  vertical  direction,  which  removes  vertical  acceleration  from  the  momentum  equation.  Since  the 
HPE  rarely  appear  outside  of  atmospheric  and  climate  studies,  we  present  a  brief  derivation  from  the 
compressible  Euler  equations.  A  more  comprehensive  treatment  is  found  in  [138]. 

We  first  apply  a  Coriolis  term  to  the  right  hand  side  of  the  momentum  equation  (Ecp  (8a)). 
Decomposing  the  velocity  u  into  a  horizontal  u h  and  vertical  w  components,  the  horizontal  momentum 
balance  is  given  by 

^  =  --VHP-fkxuH  (14) 

Dt  p 

where  /  =  2wsina  is  the  Coriolis  constant  at  the  latitude  a  for  angular  rotation  lo.  Next,  we  transform 
Ecp  (14)  into  isobaric  coordinates  ( x,y,p ),  where  pressure  is  the  vertical  component;  this  is  a  useful 
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intermediate  step  on  the  path  to  a  coordinates.  Introducing  a  velocity  potential  <P(x,y,p,t),  it  can  be 
shown  that  Vp$=  Vp/p,  where  the  gradient  is  taken  with  respect  to  isobaric  coordinates,  yielding 


Pp^H 

Dpt 


-Vp«f-/kxuij, 


where  the  total  derivative  in  Eq.  (15)  is  defined  as 


Dp 

Dpt. 


d_ 

dt 


+  upp  •  Vpp  +  w 


d_ 

dp 


(15) 


(16) 


and  Cj  =  Dp/ Dt.  Next,  we  transform  Ecp  (15)  into  a  coordinates  via  a  =  p/pg,  where  ps  =  ps(x,y,t) 
is  the  surface  pressure.  Note  that  in  this  coordinate  system,  the  boundary  condition  on  the  ground  is 
always  <7  =  1.  Application  of  the  chain  rule  to  the  gradient  of  the  velocity  potential  yields 


Vp# 


Vo-iP—  erVlrips 


d<P 

da 


while  the  total  derivative  is  given  by 


Da 

Dat 


d 

dt 


+  Upp  •  Vpp  +  <7 


d 

da 


(17) 


(18) 


Combining  Eqs.  (17)  and  (18)  in  Eq.  (15)  yields 


chi  pp  .  chi  pp  dd> 

—zr  +  upp-VppUpp  +  cr— —  =  Vcr-5-o-Vlnps- - /kxupp 

dt  da  da 


(19) 


In  a  similar  manner,  an  equation  of  continuity  for  the  surface  pressure  ps  is  derived  from  (6a) 


^  + Vpp-(psupp)+ps|^  =  0  (20) 

along  with  a  transport  equation  for  potential  temperature  6  from  (6c) 

d0  d0 

—  +  V  •  upp  •  Vpp0  +  a-—  =  0  (21) 

dt  da 

In  each  a  level,  we  solve  for  the  prognostic  variables  q=  (ps,upp,$)T,  while  the  diagnostic  variables  are 
the  vertical  velocity  <r,  pressure  p,  and  geopotential  (f>.  Because  these  equations  are  in  exact  hydrostatic 
balance,  there  are  no  vertically  propagating  acoustic  or  gravity  waves.  By  computing  the  eigenvalues 
of  the  HPE,  it  is  shown  that  the  fastest  moving  waves  are  horizontally  propagating  gravity  waves 
[114].  Hence,  even  with  an  explicit  time  integrator,  a  much  larger  time-step  may  be  used  with  the 
HPE  than  with  the  compressible  Euler  equations.  For  this  reason,  the  HPE  form  the  basis  of  most 
global  atmospheric  and  climate  models. 


2.3  Shallow  Water  Equations  (SWE) 

The  hydrostatic  primitive  equations  require  a  solution  at  N  model  levels  (independent  a  or  pressure 
levels).  This  requires  significant  computational  effort.  The  HPE  may  be  simplified  even  further  to 
remove  all  vertical  dependence.  One  approach  is  to  expand  each  prognostic  variable  in  Ecp  (20)  in 
a  ID  Fourier  series  with  height  a  as  the  argument  and  only  retaining  the  zeroth-term  in  this  series, 
commonly  called  the  barotropic  mode.  Another  approach  is  to  start  with  the  full  compressible  Euler 
equations  and  apply  both  the  hydrostatic  approximation  given  by  Ecp  (12)  and  the  shallow  water 
approximation  where  the  deviation  of  the  geopotential  height  $  from  a  given  reference  geopotential 
o  is  small.  From  an  ocean  modeling  point  of  view,  this  assumption  is  equivalent  to  assuming  the 
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water  depth  is  small  compared  to  the  wavelength  of  the  waves  of  interest  (gravity  waves  and  Coriolis 
induced  Rossby  waves).  In  flux  form,  the  SWE  of  a  viscous  atmosphere  of  depth  h  on  a  rotating  sphere 
of  radius  r  are: 

+  V-  (<Pu®u)  =  -<PVP-  /(x  x  <5u)+px  +  zA72(<lm),  (22a) 

W  +  v-(^)  =  o 

Eq.  (22)  may  be  expressed  in  Cartesian  coordinates  instead  of  spherical  coordinates  by  applying  a 
fictitious  force  /ix,  where  p  is  the  Lagrange  multiplier;  this  approach,  which  facilitates  an  arbitrary 
spherical  grid,  was  first  proposed  by  Cote  [63]  for  the  semi-Lagrangian  solution  of  the  problem  and 
later  used  in  [110;  112]  for  the  solution  of  the  full  nonlinear  equations.  The  numerical  solution  of  SWE 
on  spherical  geometries  is  reported  by  many  authors  such  as  [242]  (FEM),  [204;  158;  294;  112]  (SEM), 
[116;  227;  303]  (DG),  [211]  (unified  CG/DG  on  different  unstructured  grids  with  static  and  dynamic 
adaptivity),  [191;  309;  316;  196]  (FV),  [278]  (comparison  between  SEM  and  FV),  [245]  (comparison 
using  different  numerical  methods). 


(22b) 


2.4  Transport  in  the  atmosphere 

In  a  typical  atmospheric  model,  there  are  multiple  forms  of  water  (e.g.  vapor,  rain,  ice);  in  a  climate 
model,  there  are  also  hundreds  of  chemical  species.  These  quantities  are  transported  and  diffused 
by  atmospheric  dynamics  and  are  classified  as  tracers.  In  turn,  these  tracers  actively  feedback  to 
dynamics  (e.g.  latent  heat  release).  To  model  these  tracers,  the  governing  equations  of  a  dry  flow 
must  be  coupled  to  a  set  of  transport-diffusion  equations  for  such  tracers.  For  simplicity,  we  describe 
how  tracers  are  treated  in  atmospheric  models  by  looking  at  the  transport  of  three  water  quantities 
only;  however,  this  approach  applies  to  an  arbitrary  number  of  tracers. 

Let  us  define  the  mixing  ratios  of  water  vapor,  cloud  water,  and  rain  as  qv  =  pv/ p,qc  =  Pc/ P  and 
qr  =  pr/p,  where  pt,lC>r  are  the  densities  of  water  vapor,  cloud,  and  rain.  Let  us  also  choose  one  of 
the  nonhydrostatic  equation  sets  described  previously  and  write  the  coupled  system  of  equations  that 
model  a  moist  atmosphere;  we  consider  system  (8)  and  write  the  following: 


^  +  u-Vu+  -Vp=  -g(l  +  e<^  -qc-qr)  -2a;  x  u  +  StUrb, 
ot  p 

(23a) 

S  +  V-(pu)=0, 

(23b) 

90 

— +  u  -V6  =  V  ■{KgVd)  +  Sg(p,d,qv,qc,qr), 

(23c) 

-r^r  +  u  ■  V®  =  V  •  ( kqiVql )  +  Sqi (p,9,qv,qc,qr),  for  i  =  v, c,r , 

(23d) 

where  e  =  R/Rv  is  the  ratio  of  the  gas  constants  of  dry  air,  R,  and  of  water  vapor,  Rv.  Moist  air 
contributes  to  the  buoyancy  of  the  flow,  so  that  the  right  hand  side  of  the  momentum  equation  must 
be  corrected  by  the  total  buoyancy  B  =  —  pg(l  +  eqv  —  qc  —  qr)-  The  diffusion  coefficients  kgi  and  kqi 
are  typically  modeled  via  an  algebraic  turbulence  closure  via 

ke  =  u/Pro  +  VtfPrt 

(24a) 

kqi  =  u/Sco  +  vt/Sct 

while  the  closure  term  Sturb  depends  on  the  turbulence  model  employed.  In  Ecp  (24), 

(24b) 

v  is  molecular 

viscosity,  vt  is  eddy  viscosity,  Scq  is  the  molecular  Schmidt  number  and  Set  is  the  turbulent  Schmidt 
number.  Typical  values  are  Scq  =  Set  =  0.7.  The  microphysical  processes  that  involve  phase  change 
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in  the  water  content  are  modeled  by  the  source/sink  terms,  SgtQi,  in  the  equations.  For  example,  in 
the  case  of  water  vapor,  Si  is  driven  by  evaporation  and  condensation.  These  terms  can  be  modeled 
and  computed  by  some  properly  designed  microphysics  scheme,  such  as  the  Kessler  [177]  scheme  for 
warm  clouds  (no  ice  involved). 

The  appropriate  numerical  discretization  of  Eqs.  (23d)  is  still  an  active  topic  of  research,  espe¬ 
cially  since  moisture  possesses  large  gradients  that  can  cause  instabilities.  In  addition,  since  the  mass 
fractions  qi  are  a  priori  non-negative,  the  numerical  discretization  should  be  monotonic  or,  at  the 
very  least,  positivity-preserving.  If,  for  example,  our  system  produced  negative  moisture,  the  physical 
parameterization  would  have  to  resolve  this  issue  in  some  way  (e.g.  clipping  the  negative  values);  in 
addition,  the  resulting  incorrect  feedback  may  pollute  the  overall  solution  and  cause  artificial  rain  to 
be  produced  by  the  model.  The  words  of  John  P.  Boyd  are  an  amusing  conclusion  to  this  paragraph: 
"[...]  Clever  adaptive  algorithms  that  work  for  smooth,  straight  shocks  disintegrate  into  computational 
anarchy  when  flayed  by  gravity  waves,  assaulted  by  moist  convective  instability,  battered  by  highly 
temperature-sensitive  photochemistry,  and  coupled  to  the  vastly  different  time  and  space  scales  of 
the  ocean]...]"  (SIAM  News,  Multiscale  Numerical  Algorithms  for  Weather  Forecasting  and  Climate 
Modeling:  Challenges  and  Controversies.  Nov  2008,  Vol.41  issue  9).  Monotonic  solutions  are  certainly 
more  difficult  to  achieve  with  high  order  numerical  methods.  The  problem  is  particularly  challenging 
when  the  transport  equation  is  solved  by  high  order  methods  such  as  spectral  elements  or  DG.  High 
order  methods  produce  Gibbs  oscillations  near  sharp  gradients;  these  oscillations  are  unphysical  and 
are  exacerbated  by  increasing  the  order.  Hence,  limiters  [187]  or  adaptive  filtering  is  necessary.  We  will 
address  this  problem  in  Section  4,  along  with  some  issues  involved  with  unstable  Galerkin  solutions. 


2-4-1  Cloud  microphysics:  Kessler  parameterization 

Cloud  microphysics  include  all  thermo-physical  processes  at  the  scales  of  the  particles  that  form  the 
cloud.  Examples  are  the  phase  change  of  water  quantities  or  the  agglomeration  of  particles  into  larger 
ones.  Most  physical  processes  typical  of  storm  dynamics  (e.g.,  precipitation,  freezing,  deposition, 
or  sublimation)  have  physics  across  a  large  range  of  spatial  and  temporal  scales  that  makes  direct 
numerical  simulation  unfeasible  (see  [88],  Ch.  10).  For  this  reason,  parameterization  is  commonly  used 
within  numerical  models.  Microphysical  parameterization  relies  on  the  physical  knowledge  of  certain 
processes  without  the  need  for  fully  resolving  all  the  microscale  processes  that  are  involved.  The  clear 
limitation  is  that  certain  phenomena  cannot  be  represented  with  high  accuracy  if  they  lie  outside 
of  the  conditions  required  by  the  parameterization,  differentiation  A  simple  representation  of  cloud 
microphysics  was  designed  by  Kessler  and  reported  in  his  monograph  [177]. 

Kessler’s  is  a  bulk  model,  meaning  that  water  species  are  categorized  only  with  respect  to  the 
particles’  type.  In  other  words,  if  we  speak  about  cloud  water,  we  would  model  it  through  one  equation 
that  represents  the  transport  of  cloud  water  concentration  with  water  droplets  of  one  single  size.  Bulk 
models  are  contrasted  by  explicit  models,  where,  within  each  category  (e.g.,  cloud,  rain)  the  size  of 
the  water  particles  is  considered  as  well.  Explicit  models  are  certainly  more  physically  accurate,  but 
they  are  more  costly  due  to  the  greater  number  of  quantities  that  must  be  accounted  for.  For  more 
information  on  the  topic  refer  to  Houze’s  book  [139]  and  to  more  recent  literature  (e.g.  [39]). 

Kessler’s  is  a  simple  scheme  based  on  the  main  assumption  that  ice  is  not  contemplated  (warm 
rain).  The  main  limitation  of  the  warm  condition  is  that  only  moist  convection  at  the  tropics  or  at 
mid-latitudes  in  the  warm  season  can  be  represented.  The  three  forms  of  water  that  are  considered 
are:  (i)  water  vapor;  (ii)  cloud  water  (liquid  water  whose  size  is  so  small  that  its  terminal  fall  speed 
is  negligible);  and  (iii)  precipitating  water  that  only  includes  rain  (namely,  drops  whose  diameter  is 
>  0.5  mm).  Drizzle  is  excluded  (rain  of  drop  diameter  between  0.2  and  0.5  mm). 

The  main  processes  resolved  by  a  warm  cloud  microphysics  scheme  are  briefly  described  below. 
These  processes  dictate  how  the  source  terms  of  the  previous  equations  are  defined  and  how  they 
affect  the  dynamics  of  the  simulation.  The  reader  is  referred  to,  e.g.,  [139]  and  references  therein  for 
a  more  thorough  analysis. 
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Given  the  approximated  Teten’s  formula  [27]  for  the  saturation  vapor  pressure, 


e*  =  611.2exp 


17.67T 
T  + 243.5 


the  saturation  mixing  ratio  is  given  by 


From  [181],  the  source  terms  in  (23)  are 


Se 


rp  ( QvS  +  Er)  , 

CpT 


(25) 

(26a) 


Sqv=qVs  +  Er,  (26b) 

Sqc  =  Qvs  Ar  Gr,  (26c) 

Sqr  = -  —  (pVrqr)  —  Er  +  Ar +Cr,  (26d) 

where  cpi  and  cpv  are  the  heat  coefficients  at  constant  pressure  of  liquid  water  and  water  vapor, 
respectively,  Lv  =  Lv q  —  ( cpi  —  cpv)(T  —  To)  is  the  latent  heat  of  vaporization  with  reference  value 
Lvq  =  2.5e  +  6  Jkg~1,  Tq  is  a  reference  temperature,  Vr  is  the  terminal  fall  speed  of  raindrops  (taken 
positive  in  the  downward  direction),  and  q'vs  is  the  rate  of  condensation  or  evaporation  (the  dot 
symbol  indicates  differentiation  with  respect  to  time).  Ar,Cr ,  and  Er  are  the  rates  of  autoconversion, 
collection,  and  evaporation  of  rain.  They  are  computed  using  the  formulas: 


Ar  =  MAX  (0,  k\  (qc  —  ax)) , 

(27a) 

a  =  fc2P0-37W-875, 

(27b) 

E  _  1  {qv/qvs  -  i)k(PQr)0525 

r  p  5.4  x  105  +  2.55  x  106(p?„s)’ 

(27c) 

where  k\  =  0.001  s^1,  &2  =  2.2  s^1,  ax  =  0.001  kg  kg-1  are  Kessler’s  parameters  and  k  is  the  ventilation 
factor  that  is  a  function  of  the  terminal  fall  speed.  Eq.  (27a)  was  derived  by  Kessler  considering  that 
a  cloud  is  converted  into  rainwater  whenever  qc  exceeds  a  threshold  ax-  Autoconversion  is  the  rate  at 
which  the  rain  water  content  increases  at  the  expense  of  cloud  water  due  to  the  coalescence  of  smaller 
particles.  Yet,  this  process  is  not  fully  understood.  Nor  is  it  fully  understood  how  collection  occurs.  As 
the  name  suggests,  collection  can  be  explained  as  cloud  water  particles  being  collected  by  the  falling 
larger  rainwater  droplets  that  go  through  the  cloud  layers  during  their  fall.  Evaporation  occurs  when 
the  sensible  heat  flux  from  the  environment  into  the  water  droplet  is  balanced  by  the  latent  heat  of 
evaporation  of  the  water  particle.  As  in  [276],  the  cloud  droplets  move  at  the  same  speed  of  the  flow 
because  they  are  considered  having  negligible  terminal  velocity. 

The  values  of  the  constants  in  (27)  are,  to  a  certain  extent,  arbitrary  [139];  however,  by  the 
observations,  it  is  of  common  agreement  that  ki,k2  and  ax  are  non-linear  terms  with  respect  to  qc 
itself.  They  are  also  a  function  of  temperature  and  of  the  distribution  of  the  condensation  nuclei. 
As  it  is  pointed  out  in  Emanuel’s  book  [88],  the  lack  of  understanding  of  the  underlying  physics  is 
such  that  different  results  are  being  obtained  by  different  and  more  sophisticated  schemes.  However, 
this  topic  is  beyond  the  scope  of  the  present  article.  Nevertheless,  it  is  important  to  emphasize  that 
microphysical  parameterization  has  a  major  role  in  forecasting  clouds  and  precipitation,  but  is  still 
an  active  field  of  investigation  (see  the  2008  paper  by  Morrison  and  Grabowski  [223]). 
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2-4-2  Method  of  solution  via  saturation  adjustment 

Regardless  of  the  type  of  space  approximation,  phase  changes  are  classically  treated  via  the  saturation 
adjustment  technique  explained  in  detail  in  the  appendix  of  [276].  Saturation  adjustment  -or  fractional 
step  method-  is  not  the  only  option;  however,  due  to  its  simplicity,  it  is  convenient  to  describe  it  here 
to  give  a  sense  of  how  phase  change  is  accounted  for  in  these  models. 

The  saturation  adjustment  technique  consists  of  solving  the  problem  in  two  steps.  First,  the 
prognostic  equations  are  solved  by  neglecting  all  the  terms  that  involve  phase  changes  (all  the  S- 
terms  are  set  to  zero).  This  means  that  the  dynamics  and  transport  equations  are  advanced  for¬ 
ward  to  an  intermediate  time-step  n*  so  that  the  intermediate  values  of  the  prognostic  variables, 
{p,p,Op,qvs,qv,qc,qr)*,  are  obtained.  These  values  are  plugged  into  the  Kessler  module  to  compute 
the  ^-quantities  defined  above.  Once  the  computation  of  S  has  completed,  thermodynamic  variables 
are  updated  and  returned  to  the  Euler/transport  solver  as  the  initial  values  for  the  next  time  step 
n+  1. 
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3  Element-based  Galerkin  methods:  finite  elements,  spectral  elements,  and  nodal 
discontinuous  Galerkin 

As  discussed  in  the  EBG  roadmap,  the  finite  element  (FEM),  spectral  element  (SEM),  and  discontin¬ 
uous  Galerkin  (DG)  methods,  are  specific  types  of  Galerkin  approximation  techniques.  In  this  section, 
we  introduce  the  ideas  behind  Galerkin  schemes  in  general  and  then  distinguish  between  FEM,  SEM, 
and  DG  in  particular.  We  then  trace  the  history  of  EBG  methods  in  NWP  and  climate  modeling  over 
the  past  twenty  years. 


3.1  Element-based  continuous  Galerkin  methods 

The  birth  of  Galerkin  methods  dates  back  to  Boris  Grigoryevic  Galerkin  and  his  work  on  the  numerical 
solution  of  the  equations  of  the  elastic  equilibrium  of  rods  and  plates  [105],  and  to  the  original 
ideas  of  Walter  Ritz  [253]  six  years  earlier.  Popularized  by  Courant  in  the  early  40s  for  the  study  of 
vibration  and  equilibrium  [66]  and  extensively  developed  only  in  the  late  1950s  and  1960s  by  structural 
dynamicists  in  the  aircraft  industry  [4],  finite  element  methods  in  particular  are  among  the  most 
common  numerical  methods  based  on  a  Galerkin  approach  and  that  are  used  today  in  a  wide  range  of 
applications.  They  are  used  in  industry  and  for  research  purposes  in,  e.g.,  structural  analysis  [325],  fluid 
dynamics  [334],  and  electromagnetism  [14].  Galerkin  based  methods  are  a  robust  tool  for  the  solution 
of  any  differential  problem  [79]  and  are  accepted  by  scientists  and  engineers  in  theoretical  studies 
and  applications  for  a  series  of  reasons  such  as  the  ease  in  modeling  complex  geometries,  the  flexible 
and  general  purpose  programming  format  that  they  imply,  and  the  intrinsic  treatment  of  differential- 
type  boundary  conditions.  In  the  following,  we  will  describe  the  idea  behind  the  method  of  weighted 
residuals,  of  which  the  Galerkin  finite  element,  spectral  element,  and  discontinuous  Galerkin  methods 
represent  special  cases.  For  a  simple  but  quasi-rigorous  analysis  of  the  method  we  use  a  problem 
of  real  engineering  interest  and  that  is  a  fundamental  problem  in  numerical  weather  prediction:  the 
advection-diffusion  equation.  The  reader  is  referred  to  the  books  by  Fletcher  [95]  or  by  Karniadakis 
and  Sherwin  [173]  as  a  reference  for  the  more  mathematical  aspects  of  Galerkin  methods,  and  to 
the  lecture  notes  by  Giraldo  [115]  for  a  unified  treatment  of  high-order  continuous  and  discontinuous 
Galerkin  methods. 

Let  us  take  a  general  differential  problem 


C(q)  =  S,  (28) 

where  L  is  the  combination  of  both  linear  and  non-linear  differential  operators  in  space  x  and  time 
t,  and  S  is  a  source  function.  Let  d  indicate  the  space  dimension  and  let  fi  C  Rd  be  the  domain  with 
the  boundary  dfi  where  (28)  is  defined  within  the  time  interval  (0 ,ty),  and  tj  G  R+.  For  the  problem 
to  be  well-posed,  suitable  boundary  and  initial  conditions  must  be  added  to  (28).  Unless  otherwise 
stated,  given  a  known  function  g,  Dirichlet  boundary  conditions  q(x)  =  g  for  x  6  dfl  will  be  applied 
to  the  problems  described  throughout  this  section. 

As  previously  stated,  Galerkin  methods  are  a  particular  case  of  the  method  of  weighted  residuals. 
The  idea  behind  this  method  is  the  numerical  representation  of  the  solution  variable  q  by  a  finite 
dimensional  approximation  qh  obtained  by  the  expansion 

N 

**(x)  =  5>(x)&,  (29) 

k— 0 

where  N  is  the  number  of  nodes  pk  of  a  possible  partition  of  the  domain  Q.  On  its  discrete  counterpart, 
J2  ,  a  set  of  k  0,v, .,.Y  known  analytic  test  functions  ipk  are  defined  (The  two  terms  test  and  basis 
will  be  used  interchangeably.  The  unknown  coefficients  q t-  correspond  to  the  physical  values  of  q  at 
node  pfc.  The  finite  difference  method  is  conceptually  different  in  that  what  is  approximated  in  the 
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differential  problem  are  the  differential  operators  and  not  the  solution  variable.  Substitution  of  (29) 
into  (28)  is  such  that  C(qh)  —  S  ^  0.  The  method  is  called  method  of  weighted  residuals  because  a 
linear  system  of  algebraic  equations  in  the  unknowns  q  is  built  by  imposing  that 

/  wRdfl  =  0,  (30) 

Jo 

where  R  =  C  —  S  is  the  (non-zero)  residual  of  (28)  and  w  is  the  weight  function  that  has  certain 
properties.  Different  methods  arise  from  the  selection  of  different  w.  The  Bubnov-Galerkin  method  is 
found  when  w  =  ipk-  We  can  then  write  the  following: 


/  ip[C-S\dQ  =  0. 

Jo 


This  is  the  weak  form  of  the  original  equation  to  be  solved. 


(31) 


Remark  3.1.  So  far,  no  distinction  between  the  finite  and  spectral  element  methods  has  been  made. 
The  difference  stems  from  the  definition  of  the  interpolation  points  used  to  construct  ipk- 


3.1.1  Suitable  function  spaces 


The  choice  of  basis  and  test  functions  depend  on  the  operator  C  under  consideration.  In  the  specific 
cases  of  the  advection-diffusion  equation  and  the  Navier-Stokes  equations  of  compressible  flows,  the 
highest  order  of  the  derivatives  is  2,  and  the  choice  of  the  basis  functions  and  the  space  to  which  they 
belong  must  depend  on  this  regularity  condition. 

We  show  that  the  weak  solutions  to  a  linear  elliptic  operator  must  resides  in  the  Sobolev  space 
H1(fl).  Consider  an  operator  C  defined  on  a  global  domain  17  with  boundary  T  acting  on  a  state 
vector  g;  specifically,  consider  the  elliptic  operator 

C{q)  =  S7-{vS7q)  (32) 

where  v  >  0  is  the  kinematic  viscosity.  We  consider  the  boundary- value  problem 

£(q)  =  o  (33) 

with  a  Dirichlet  boundary  condition  g(x)  =  gg(x)  for  all  xef  and  go(x)  £  C1  (T).  Consider  a  test 
function  ip  G  L2  (17)  and  also  assume  q  G  L2  (17).  The  following  calculation  demonstrates  that  q,ip  G 
H1  C  L2. 

Integrating  Ecp  (32)  by  parts  yields 


vS7ip  -V gd!7  =  /  ipS/ ■  (yS/q)  dr. 


(34) 


Since  go  £  C1  and  ip  G  L2 ,  the  right  hand  side  of  Ecp  (34)  is  bounded,  implying  that  the  left  hand  side 
is  bounded  as  well.  We  then  write  that: 


<  /  \vS7ip -SI q\  dft  (35) 

Jo 

<*'l|VV’||i2||Vg|L2 

where  the  second  line  follows  from  the  Cauchy-Schwartz  inequality.  Hence,  both  the  norms  ||V'!/>||i2 
and  1 1  Vg  ||L2  are  bounded,  implying  that  SI  ip  and  Vg  are  square-integrable  over  the  global  domain  17. 
In  other  words,  ip,q  G  H1  (17). 

With  regards  to  CG  methods,  this  elementary  calculation  illustrates  two  key  points: 


vS/ip  ■  S/qdQ 
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1.  The  space  of  test  functions  must  be  a  subset  of  if1. 

2.  Since  H 1  C  C°,  the  state  vector  q  is  necessarily  continuous. 

We  hence  define  the  space  W  of  test  functions  ip  and  trial  solutions  q  as  a  subset  of  H 1  such  that 
W  =  {ip,q  6  if1(J7)  s.t.  ip  =  0  and  q  =  g  on  912} .  (36) 


3.2  Finite  and  Spectral  Elements:  discretization  and  basis  functions 

To  discretize  the  problem  in  a  finite  and  spectral  element  sense,  the  domain  t Q  is  first  decomposed 
into  a  finite  element  partition  Vh  =  {Kl}i=i,...,nei  of  nei  conforming  elements  Kl  such  that 


nel 

nel 

Q  —  Kl ,  and 

2=1 

f>-  =  o, 

2=1 

(37) 

where  every  element  Kl  is  the  image  of  the  reference  element  /  by  a  non-singular  bijective  mapping 
x  =  %*(£)  from  physical  space  x  to  computational  space  £.  J  =  dx/d£  is  the  transformation  Jacobian 
matrix.  A  two-dimensional  example  of  the  map  is  represented  in  Fig.  7. 

The  need  for  mapping  is  purely  practical  and  forms  the  foundations  of  the  finite  element  compu¬ 
tation.  For  details  see  [157]. 

Basis  functions:  Finite  Elements.  Lagrange  basis  functions  are  a  common  choice  in  finite  elements 
since  they  interpolate  a  continuous  function  exactly  at  the  nodes  Xi.  These  functions,  defined  by  h k 
from  now  on,  have  the  property  of  being  piecewise  continuous  and  are  such  that 

hk(xi)  =  5kt  k,l  =  0,...,N , 

where  Ski  is  the  Kronecker  delta. 

For  linear,  quadratic,  and  cubic  finite  elements,  the  roots  of  the  basis  function  along  the  reference 
element  I  are  the  N+l  equi-spaced  nodes  within  the  element.  Using  the  definition  of  the  Lagrange 
polynomials 


M£) 


N 


n 

1=0, l^k 


Cfc  -  fz  ’ 


(38) 
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in  Fig.  8  we  plot  hk  along  a  reference  element  up  to  2  -order.  A  4  -order  finite  element  and  corre¬ 
sponding  basis  function  are  plotted  in  Fig.  9  (left). 


Basis  functions:  Spectral  Elements.  Unlike  the  case  of  high-order  finite  elements,  the  polynomials 
used  with  spectral  elements  are  associated  with  zeros  that  are  not  equi-spaced.  A  convenient  set  is 
represented  by  the  Legendre-Gauss-Lobatto  (LGL)  points.  LGL  nodes  are  the  roots  of 

(i-£2)^(0  =  o,  (39) 

where  Pn(0  are  the  ~Nth -order  Legendre  polynomial  whose  construction  by  recursive  formulas  can  be 
found  in  [173].  The  polynomials  that  are  used  have  the  same  5-property  of  the  Lagrange  polynomials 
defined  above.  Their  analytic  expression  is  given  by 


MO 


(C2- 1)^(0 
N(N  +  l)(£-£k)PN(€)  ’ 


k  =  0, ...  ,7V, 


(40) 


where  P'  indicates  differentiation  with  respect  to  x'.  The  4  -order  ^-polynomials  along  I  =  [—1,1] 
are  plotted  on  the  right  panel  of  Fig.  9.  The  comparative  plot  (finite  element  on  the  left  and  spectral 
element  on  the  right)  is  used  to  show  that,  if  high-order  is  required,  equi-spaced  nodes  produce 
unsatisfactory  types  of  basis  functions  in  the  proximity  of  the  edge  points  of  the  element.  In  other 
words,  we  lose  control  on  the  maximum  and  minimum  values  of  h at  the  extrema  of  the  element. 
When  this  happens,  interpolation  of  any  function  is  likely  to  suffer  from  such  a  condition.  To  show 
how  this  feature  translates  into  the  interpolation  of  a  known  analytic  function,  we  use  the  following 
example  from  [115].  We  define  the  Witch  of  Agnesi  of  unitary  height  as 

Z^X)=  1.0  +  50  a;2  ’ 


where  z(x)  is  smooth  and  continuous,  and  interpolate  it  using  the  basis  functions  ip(x)  =  h].{x)  defined 
above.  The  test  is  performed  by  4  -order  interpolation.  Equi-spaced  and  non  equi-spaced  points  are 
used  along  the  unitary  domain.  Fig.  10  shows  that  the  more  the  polynomial  order  is  increased,  the 
better  the  result  is  when  LGL  nodes  are  employed.  This  is  tied  to  the  definition  of  the  Lagrange 
polynomials  and  their  interpolation  strength  given  by  the  Lebesgue  constant.  The  reader  is  referred 
to  [115]  for  more  details  on  this  issue.  Roughly  speaking,  this  analysis  serves  as  a  practical  way  of 
showing  one  reason  for  the  use  of  LGL  points  in  high-order  simulations  rather  than  high-order  elements 
with  evenly  distributed  nodes.  Fig.  11  is  a  schematic  representation  of  two  4  -order  elements  in  two 
dimensions. 


3.3  Discontinuous  Galerkin 


The  discontinuous  Galerkin  method  allows  the  numerical  solution  and,  therefore,  the  basis  functions 
to  be  discontinuous  at  the  interface  between  neighboring  elements.  For  this  reason,  the  basis  function 
is  no  longer  required  to  live  in  H 1  but,  rather,  in  L2.  Assume  m  is  the  number  of  elements  that  share 
grid  point  i.  Then  we  will  have  ni  different  values  of  the  solution  at  that  grid  point;  one  coming  from 
the  computation  on  the  left  element  and  one  on  the  right  element,  where  left  and  right  are  defined 
with  respect  to  the  shared  edge  (or  face  in  3D).  The  basis  functions  for  element  fie  vanish  everywhere 
outside  the  element.  Hence  Eq.  (31)  becomes  a  set  of  nei  independent  equations  for  each  element  f2e: 


if  [£{q)-S(q)]  df2e  =  0. 


(41) 


7  Alternatively,  the  basis  functions  can  be  constructed  using  Eq.  (38). 
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1st  order  Lagrange  Basis  Functions  2nd  order  Lagrange  Basis  Functions 


Fig.  8:  Lagrange  polynomials  of  order  1  (left)  and  2  (right)  along  the  ID  reference  element  I  =  [—1,1].  Clearly, 
they  are  equivalent  for  FE  and  SE. 


4th  order  Lagrange  Basis  Functions  4th  order  Lagrange  Basis  Functions 


Fig.  9:  Basis  functions  of  order  4  along  the  ID  reference  element  I  =  [—1,1].  Left:  the  nodes  within  the  element 
are  equi-spaced  as  for  classical  high-order  FE.  Right:  Lagrange-Legendre  polynomials  of  order  4  whose  roots  are 
the  non-equi-spaced  Legendre-Gauss-Lobatto  (LGL)  quadrature  points.  Nodal  SE  and  DG  may  employ  LGL  or 
LG  quadrature.  However,  to  obtain  a  diagonal  mass  matrix  then  LGL  is  the  only  choice  for  SE,  while  LG  can 
still  be  used  for  DG. 


The  equations  for  the  different  elements  are  coupled  by  means  of  the  fluxes  between  neighboring 
elements.  For  this  purpose  we  write  our  equations  in  flux  form: 


£(9)  =  §+v'f(9) 


(42) 
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Fig.  10:  Interpolation  of  a  known  function  (Witch  of  Agnesi)  using  high-order  interpolating  functions  with  equi- 
spaced  and  LGL  points.  Left:  4  -order  interpolation.  Right:  10  -order  interpolation 


Fig.  11:  Nodes  disposition  for  a  two-dimensional  4  -order  finite  element  (left),  and  spectral  element  (right). 


where  F  is  the  flux  tensor.  Eq.  (41)  becomes 

/  (ff+V-F-s)v’(x)dfle  =  0.  (43) 

with  F  =  F(g)  and  S  =  S(q).  Integration  by  parts  leads  to 

J ^  (^-F-V-5j^(x)d«e  =  -^  V’Wn-FdTe,  (44) 

where  re  is  the  boundary  of  fle  and  n  is  the  outward  pointing  unit  normal  vector  on  re.  To  compute 
the  numerical  solution,  qh ,  we  replace  the  flux  in  the  boundary  integral  by  a  so  called  numerical  flux 
F  h*: 

j  (^r-Fh-V-Sh^^)<We  =  -  J  ij)  (x)  n  ■  Fh*dre, 


(45) 
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with  F/l  =  F  (gft  )  and  Sh  =  S  ( qh ).  The  numerical  flux  F^1*  describes  the  flux  through  the  discon¬ 
tinuous  interface  between  neighboring  elements  in  the  same  way  of  the  finite  volume  (FV)  method; 
therefore,  we  can  choose  any  of  the  fluxes  that  are  used  with  FV.  For  an  introduction  to  different 
choices  of  fluxes  we  refer  to  [302].  Unlike  FV,  DG  is  relatively  insensitive  to  the  choice  of  the  numer¬ 
ical  flux  due  to  the  high  order  (p  >  3)  basis  functions  that  are  used  within  the  element.  Therefore  a 
common  choice  for  the  numerical  flux  is  the  simple  Rusanov  flux: 

F'1*  =  \  [F  (<?£)  +F  (4)  -  An  (<&-<*£)],  (46) 


where  A=  ||ti||2  +  c  is  the  maximum  wave  speed,  ||w||9  =  s/u2  +v2  +  w2,  and  c  is  the  speed  of  sound. 
The  subscript  L  denotes  the  index  of  the  element  f2e  whereas  the  subscript  R  denotes  the  index  of  the 
neighboring  element.  There  are  recent  approaches  to  incorporate  fluxes  that  are  not  perpendicular  to 
the  interface  between  elements  [328]. 

The  integration  by  parts  of  Ecp  (45)  leads  to 


J  (j^  +  V  Fh-Sh^rP(x)dne  =  J  if  (x)  n  •  (Fh  —  Fh* )  cLTe. 


Using  the  expansion  (29)  of  the  numerical  solution  gives  us 

9^  =  ~  [  &(x)(V-F h-Sh)df2e+  f  i>k(x)n-(Fh-Fh*)dre, 

J  iQg  J  /g 


(47) 


(48) 


with  the  definition  (x)  =  i[)k  (x)  where  Mik  =  J ^  dfle  are  the  components 

of  the  mass  matrix  M. 

Second  order  derivatives  in  the  differential  operator  C  can  be  discretized  with  DG  by  transforming 
the  problem  into  a  coupled  set  of  equations  containing  only  first  order  derivatives  as  done  in  [54].  This 
approach  is  called  the  local  DG  (LDG)  method.  Other  choices  for  discretizing  second  order  operators 
are  given  in  [266]. 


3.4  EBG  methods  in  atmospheric  and  climate  modeling 
3-4-1  Continuous  Galerkin 

The  use  of  continuous  Galerkin  methods  in  atmospheric  simulations  began  five  decades  ago  with 
the  work  on  finite  elements  by  Holmstrom  [137]  and  Simons  [269]  in  the  60s.  This  continued  in  the 
70s  (e.g.  [101;  69;  70])  and  was  followed  by  an  extensive  production  of  articles  in  the  80s  and  90s 
with,  e.g.,  Staniforth,  [279],  Beland  et  al.  [19],  or  Burridge  et  al.  [41],  who  set  the  foundations  of  the 
operational  Global  Environmental  Multiscale  (GEM)  model  [65;  327]  of  the  Canadian  Meteorological 
Center  &  Meteorological  Research  Branch  (CMC-MRB).  In  the  UK,  Untch  and  Hortal  [305]  used 
finite  elements  for  the  vertical  discretization  of  a  semi-Lagrangian  transport  scheme  and  introduced  it 
in  the  operational  version  of  the  European  Centre  for  Medium- Range  Weather  Forecasts  (ECMWF) 
global  spectral  model  (IFS),  with  great  improvement  with  respect  to  the  FD  version  of  the  code.  In  the 
domain  of  Geophysical  Fluid  Dynamics,  more  Galerkin-type  models  have  appeared  since  the  beginning 
of  the  new  millennium.  In,  e.g.,  [34;  193;  228]  or  [119],  different  variational  formulations  mostly  based 
on  spectral  elements  are  employed  to  solve  the  shallow  water  equation  or  the  Navier-Stokes  and  Euler 
equations  for  non-hydrostatic  atmospheres.  More  examples  of  element-based  models  are  the  SE-Core 
[296],  CAM-SEM  by  [74],  the  SE/DG  Nonhydrsotatic  Unified  Model  for  the  Atmosphere  (NUMA) 
[176;  117],  the  SEM  Community  Earth  System  Model  (CESM)  [73],  the  finite  element  multi  physics 
model  ALYA  in  atmospheric-mode  [212;  213]. 
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Possibly,  the  spectral  element  method  is  the  most  common  EBG  method  used  today  to  develop 
the  next  generation  research  NWP  models.  Spectral  elements  were  first  introduced  in  geophysical  fluid 
dynamics  by  Ma  in  1993  [204].  Ma  built  on  the  pioneering  work  of  Patera  [235],  who  developed  the 
spectral  element  method  for  incompressible  CFD  and  developed  an  ocean  model  based  on  the  shallow 
water  equations.  In  particular  Ma  stressed  the  ability  of  SEM  to  1)  accurately  simulate  flows  with  high 
Rossby  numbers  and  2)  simulate  phenomena  with  long  time  durations  due  to  SEM’s  low  dissipation 
and  dispersion  error.  Although  Ma’s  primary  application  was  coastal  ocean  modeling,  he  was  explicitly 
aware  of  the  intimate  connection  between  ocean  models  and  atmospheric /climate  models  and  predicted 
that  his  work  would  serve  as  a  basis  for  atmosphere  and  climate  studies.  Iskandarani  [158]  built  on 
Ma’s  ocean  model,  showing  that  the  accuracy  of  spectral  elements  successfully  suppressed  spurious 
pressure  modes  in  ocean  flows.  Both  Ma’s  and  Iskandarani’s  work  utilized  Cartesian  grids  suitable  for 
oceanographic  problems.  Two  years  later,  Taylor,  Tribbia,  and  Iskandarani  [294]  developed  the  first 
SWE  spectral  element  model  using  spherical  geometry.  In  particular,  this  work  used  a  cubed  sphere 
with  quadrilateral  elements  which  built  on  the  geometrical  flexibility  of  spectral  elements.  The  cubed 
sphere  grid  circumvented  the  well-known  pole  problem  that  is  present  for  traditional  latitude-longitude 
grids  (we  will  get  back  to  this  point  shortly).  At  the  same  time,  a  spectral  element  shallow  water  code 
was  developed  by  Haidvogel  et  al. [131] .  As  a  result,  this  methodology  was  extended  to  solving  the 
hydrostatic  primitive  equations  on  the  sphere  in  the  Spectral  Element  Atmospheric  Model  (SEAM) 
[98].  By  this  time,  massively  distributed  memory  clusters  had  become  available,  thus  motivating  the 
development  of  highly  scalable  numerical  methods;  Fournier  and  coworkers  noted  the  high  parallel 
efficiency  of  spectral  elements,  thus  making  SEM  a  suitable  numerical  method  for  the  dynamical  core 
of  climate  models,  which  are  computationally  expensive.  Taylor’s  SEM  solver  later  became  the  basis 
for  the  NCAR’s  high-order  method  modeling  environment  (HOMME),  which  facilitated  the  rapid 
development  of  next  generation  atmospheric  global  circulation  models  (AGCM). 

Taylor’s  SEM  model  utilized  spherical  coordinates  to  solve  atmospheric  problems  on  the  sphere; 
however,  since  the  sphere  is  a  sub-manifold  of  three-dimensional  space,  Cartesian  coordinates  can 
also  be  used  to  solve  problems  on  spheres  provided  that  the  fluid  is  constrained  to  he  on  the  sphere 
using  a  Lagrange  multiplier  [63].  Although  computationally  more  expensive  because  there  is  an  extra 
degree  of  freedom,  this  approach  has  two  advantages  over  spherical  coordinates:  1)  analytical  Jacobian 
transformations  for  the  grid  do  not  need  to  be  derived  and  2)  any  spherical  grid  (including  the  cubed 
sphere  grid)  may  be  utilized,  thereby  liberating  the  solver  from  the  grid.  Giraldo  utilized  this  Cartesian 
SEM  approach  to  solve  the  SWE  using  an  Icosahedral  grid  in  [113].  Collaborating  with  the  Naval 
Research  Lab,  he  applied  this  framework  to  solve  the  hydrostatic  primitive  equations  in  [119]  and 
develop  the  U.S.  Navy’s  spectral  element  atmospheric  model  (NSEAM)  including  a  semi-implicit  solver 
[114].  As  we  shall  see  in  the  next  section,  Giraldo  and  coworkers  developed  DG  concurrently  with  SEM 
solvers,  thereby  exposing  the  common  themes  and  machinery  shared  by  the  two  methods.  To  prove 
how  arbitrary  grids  can  be  used  to  solve  the  SWE  on  the  sphere,  a  unified  continuous/discontinuous 
Galerkin  model  has  been  recently  presented  in  [211].  Using  this  model,  the  equations  were  solved  in 
Cartesian  coordinates  using  static  and  dynamic  adaptivity  using  both,  continuous  and  discontinuous 
Galerkin  approximations  on  the  grids  illustrated  in  Fig.  12. 


3-4-2  Discontinuous  Galerkin 

This  subsection  presents  a  short  overview  of  the  important  steps  in  the  historical  development  of 
DG  towards  atmospheric  applications.  A  more  general  overview  of  the  history  of  DG  until  the  year 
2000  can  be  found  in  [52].  Some  information  can  also  be  found  in  the  textbook  by  Hesthaven  and 
Warburton  [133]. 

The  discontinuous  Galerkin  method  was  first  introduced  by  Reed  and  Hill  in  1973  [248].  Reed 
and  Llill  were  working  on  the  solution  of  the  stationary  linear  transport  equation  of  neutrons  with  a 
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Fig.  12:  Examples  of  spherical  grids  for  the  solution  of  the  SWE.  From  left  to  right,  classical  cubed-sphere,  a 
reduced  longitude-latitude,  and  icosahedral  grid.  These  are  high  order  grids  with  curved  elements  on  the  spherical 
shell.  Figures  adapted  from  [211]  with  permission  of  John  Wiley  Sz  Sons. 


constant  velocity  v  in  two  dimensions: 


dij)  di>  . 

[i~dx  JrT]~dy  +ayjyx’y^,V)  =  &  (£,?/, /x,?7) , 


(49) 


where  ip  is  the  angular  neutron  flux  in  the  direction  =  v/| |v| I2,  the  total  macroscopic  cross- 

section  for  neutron-nucleus  interaction  a  and  the  source  term  S.  The  source  term  describes  scattering, 
fission  and  inhomogeneous  sources.  This  transport  equation  was  solved  by  Reed  and  Hill  on  a  triangular 
mesh.  They  compared  a  method  allowing  discontinuities  at  the  interfaces  between  different  triangles 
with  a  continuous  method  and  found  that  DG  was  computationally  more  expensive  but  slightly  more 
accurate  and  much  more  robust.  One  of  the  main  advantages  of  the  discontinuous  method  was  that  it 
showed  fewer  oscillations  at  the  boundary  between  areas  with  two  different  values  of  the  cross-section 
a.  This  allowed  Reed  and  Hill  to  reduce  the  resolution  of  the  method  while  still  obtaining  a  less 
oscillatory  result  than  the  continuous  method  allowed. 

The  discontinuous  method  introduced  in  [248]  was  analyzed  theoretically  by  Lesaint  and  Raviart 
[198].  I11  this  early  work  the  discontinuous  Galerkin  method  was  applied  to  linear  equations.  The 
first  application  of  DG  to  nonlinear  conservation  equations  is  attributed  to  Chavent  and  Salzano 
[48].  Chavent  and  Salzano  used  first  order  polynomials  for  spatial  discretization  and  a  simple  explicit 
Euler  method  for  time  discretization.  A  von  Neumann  analysis  showed  that  this  approach  is  unstable 
if  the  time  step  At  is  proportional  to  the  grid  spacing  Ax.  This  approach  becomes  stable  only  for 
At  tx  Ax3/2.  This  severe  restriction  for  the  time  step  with  explicit  time  integration  was  solved  by  the 
development  of  Runge-Kutta  discontinuous  Galerkin  methods  (RKDG)  by  Cockburn  and  Shu  [53]. 

Discontinuous  Galerkin  methods  were  applied  to  parabolic  equations  by  Jamet  in  1978  [163], 
displacement  of  oil  by  water  in  a  porous  slab  by  Chavent  and  Salzano  in  1982  [48],  viscoelastic  flows 
by  Fortin  and  Fortin  in  1989  [97]  and  to  the  solution  of  the  Maxwell  equations  by  Warburton  and 
Karniadakis  in  1999  [312]. 

The  first  numerical  experiment  using  DG  for  the  Euler  equations  of  gas  dynamics  is  reported  in 
the  1991  work  by  Bey  and  Oden  [26]  and  by  Bassi  and  Rebay  in  1997  [12;  13].  The  first  application 
of  DG  to  geophysical  problems  started  with  the  work  on  shallow  water  equations  by  Schwanenberg  et 
al.  [264],  followed  by  Giraldo  et  al.  [116]  who  introduced  inexact  integration  for  DG  and  extended  this 
to  the  sphere.  In  2008,  DG  was  finally  used  to  solve  the  Navier-Stokes  equations  of  non-liydrostatic 
atmospheric  flows  by  Giraldo  and  Restelli  [118].  Discontinuous  Galerkin  methods  have  not  been  used 
in  operational  global  circulation  models  (GCM)  yet.  However,  a  hydrostatic  GCM  was  presented  by 
Nair  et  al.  in  2009  [225],  followed  in  2011  by  the  German  DG  model  DUNE  [34],  and,  in  2012,  by  the 
scalable  non-hydrostatic  model  by  Kelly  and  Giraldo  [176].  The  linear  scalability  properties  of  DG 
was  shown  by  Wilcox  et  al.  in  2010  [320]  and  Kelly  and  Giraldo  in  2012  [176]. 
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As  mentioned  earlier,  the  geometrical  flexibility  of  EBG,  including  the  potential  for  adaptivity,  is  a 
major  strength  of  EBG  methods-in  particular,  DG.  An  adaptive  DG  model  based  on  non-conforming 
quads  was  introduced  in  [184].  This  paper  introduced  a  tree- based  adaptive  mesh  refinement  (AMR) 
strategy,  demonstrated  the  potential  for  DG  to  achieve  order-of-magnitude  efficiency  gains,  which  are 
difficult,  if  not  impossible,  with  traditional  finite  difference  or  spectral  transform  methods.  Dynamic 
adaptivity,  using  conforming  triangular  elements,  was  also  explored  in  [224].  These  two  papers  are  not 
the  first  ones  that  report  on  grid  adaptivity  using  EBG;  however,  they  seem  to  be  the  first  publications 
on  this  topic  in  the  framework  of  non-hydrostatic  atmospheric  simulations  using  DG. 


4  Stabilization  of  EBG  for  advection-dominated  problems 

The  straight  numerical  approximation  of  problems  with  dominating  advection  may  result  in  unphysical 
oscillations  in  the  solution.  Finite  and  spectral  element  methods  are  no  exceptions  [170]  and  an  error 
estimate  of  the  standard  Galerkin  approximation  of  the  problem  proves  it  (see,  e.g.,  [247]).  Here,  we 
show  it  by  deriving  the  ID  finite  element  solution  of  the  advection-diffusion  problem  with  Diriclilet 
boundary  conditions.  The  problem  consists  in  solving 

ft+C(q)  =  S,  (50) 

where 

£{q)  =  u  •  Vq  -  V  •  (vVq)  ,  (51) 

by  linear  (p  =  1)  finite  elements.  In  (65),  v  is  a  positive,  uniform,  constant  diffusivity  coefficient, 
u  =  (u,0,0)  is  the  velocity  vector,  and  S  is  a  source  function.  The  domain  of  interest  is  the  unit  interval 
12  =  [0, 1] .  A  uniform  partition  Vh  of  12  with  N  + 1  nodes  pk,  k  =  0, . . . , TV,  and  nei  =  N  elements  K 
of  length  h  =  ||pfe  —  Pfc-i  H2  is  assumed.  For  uniqueness  of  the  solution,  q( 0)  =  0  and  g(l)  =  1  are  the 
assigned  boundary  conditions.  Let  Wh  C  H 1  be  the  space  of  piece-wise  linear  Lagrange  polynomials 
of  class  C°  (Fig.  8,  left.)  The  projection  of  Ecp  (50)  onto  Wh  by  the  L 2  scalar  product  and  integration 
by  parts  of  the  diffusion  term  yields  the  equation 

[  iPhu-Vqhdnh+  I  vV4’h-Vqhdnh  =  I  iphSdtth  Wi/jheWh,  (52) 

J nh  J  nh  J  nh 


qh  is  expanded  by  (29).  When  S  =  0,  the  ID  finite  element  discretization  of  (52)  yields  the  discrete 
equation 

(I  “04+1  +  T4“(l  +  04-1  =  O’  k  =  l...,N-l.  (53) 

Ecp  (53)  is  equivalent  to  the  ID  discretization  of  the  same  problem  by  second-order  finite  differences. 
After  algebraic  manipulation  and  given  the  definition  of  the  element  Peclet  number 


Pe  = 


u||  h 
2v 


(54) 


Ecp  (53)  is  written  as  a  function  of  (54): 


(Pe-1)  qk+1  +  2qk-{Pe+l)  qk~i=0,  k  =  1, . . . , N-l.  (55) 

It  represents  a  tridiagonal  linear  system  in  the  unknowns  qk ,  k  =  1,7V  —  1,  whose  solution  is  the 
function  (see  [246]) 
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Fig.  13:  Finite  element  solution  of  the  advection  diffusion  problem  (50)  using  uniform,  linear  elements,  u  =  10, 
is  =  0.1,  in  a  domain  of  unitary  total  length.  With  these  values,  the  global  Peclet  is  Peg  =  50.  The  plot  shows  the 
approximate  solutions  obtained  for  different  grid  spacing  (Pe  =  2.5  and  Pe  =  0.625)  with  and  without  stabilization. 
It  is  shown  how  the  computed  solution  can  approach  the  exact  solution  by  either  increasing  the  number  of  grid 
points  (Pe  =  0.625),  or  by  maintaining  the  grid  sufficiently  coarse  but  with  the  addition  of  a  stabilizing  term 
(How  this  term  is  built  has  not  been  shown  yet,  but  the  result  gives  a  hint  on  what  to  expect  from  it). 


l+Pe^_i 

’N - ,  k  =  l,N-l.  (56) 

1  +Pe\N  i 
1  -Pe! 

The  power  of  (1+Pe)/(1-Pe)  at  the  numerator  produces  an  oscillatory  behavior  of  the  solution  when¬ 
ever  Pe  >  1,  as  it  is  shown  in  Fig.  13.  Pe  is  a  linear  function  of  h  so  that  the  grid,  in  principle,  could 
be  always  constructed  in  such  a  way  that,  for  a  given  value  of  u  and  v,  Pe  <  1.  However,  this  is 
not  viable  for  most  real  problems  because  of  the  extremely  high  number  of  grid  points  that  may  be 
necessary  to  achieve  such  a  condition.  The  only  way  to  solve  the  problem  of  numerical  instabilities 
in  the  Galerkin  solution  of  transport  problems  with  dominant  advection  remains  that  of  stabiliza¬ 
tion  by  proper  means.  A  certain  category  of  stabilization  methods  applied  to  the  multi-dimensional 
advection-diffusion  equation  will  be  described  in  the  next  section. 


4.1  Viscosity-based  stabilization  techniques 

Regardless  of  the  numerical  method  that  an  atmospheric  model  is  built  on,  dissipation  of  some  sort  is 
added  for  various  reasons;  stabilization  is  one  of  them.  As  is  pointed  out  in  [161],  the  most  common 
mean  of  dissipation  that  is  found  in  current  research  and  operational  weather  forecast  models  is 
artihcial  diffusion  (or  hyper-diffusion,  HV  from  now  on)  in  the  form  of 

HV=  [  (— 1  )a+1i/.’hVa  ■  {v2aVaqh)dnh  (57) 

where  a  is  a  positive  integer  and  zz2q,  is  the  matrix  of  the  diffusivity  coefficients  that  may  vary  along 
different  grid  directions  [125].  When  a  =  1,  HV  reduces  to  second-order  Artificial  Viscosity  [169]. 
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(Min, Max)  =  (0.0,  0.9482)  (MllxMax)  -  (-0.0739. 1.1052)  (Min.  Max)  =  (-4.125C-06, 1) 


Fig.  14:  Pure  transport  of  a  square  wave  in  a  2D  doubly-periodic  channel.  The  velocity  is  directed  along  the  x-a.xis 
(the  bottom-right  edge  of  the  squared  domain  in  the  three  plots).  From  left  to  right:  stabilization  achieved  by  V2, 
V4,  and  using  a  variational  multiscale  scheme  [143].  Adapted  from  Fig.  21  of  [210],  with  permission  by  Elsevier. 


HV  is  easy  to  implement  and  is  robust.  These  are  features  that  make  it  attractive  for  models  that 
are  not  allowed  to  break  during  a  forecast  simulation.  HV  is  found  in  other  fields  of  computational 
fluid  dynamics  as  well;  the  work  by  [11]  is  an  example  where  HV  is  used  to  stabilize  the  simulation 
of  high  speed  flows.  One  justification  that  practitioners  in  NWP  give  to  HV  for  a  >  1  is  its  scale 
selectiveness;  it  damps  higher  order  frequencies  that  are  usually  the  result  of  numerical  error  and 
dispersion,  but  tends  to  leave  the  important  and  physical  modes  untouched.  However,  even  if  HV  is 
indeed  scale-selective,  it  is  not  physical.  Since  the  artificial  term  given  by  Ecp  (57)  is  a  perturbation 
to  the  original  equation,  if  this  perturbation  does  not  go  to  zero  as  h  — >  0,  the  exact  solutions  of  the 
original  and  of  the  perturbed  problems  are  not  equivalent.  As  is  evident  in  Fig.  14,  these  methods  add 
an  uncontrolled  and  non-localized  diffusion  that  yields  a  certain  deterioration  of  the  solution.  For  a 
stabilizing  scheme  to  preserve  the  shape  of  the  tracer,  dissipation  should  be  avoided  in  the  direction 
normal  to  the  flow  and  only  act  in  the  direction  parallel  to  the  flow  [146;  152;  55]. 

To  preserve  the  correct  physical  dimensions  of  the  hyper-viscous  term,  the  value  of  v% a  must  scale 
with  respect  to  a  and  the  grid  spacing.  Its  selection  not  only  is  non-trivial,  but  has  a  great  impact 
on  the  solution  of  the  problem.  Jablonowski  and  Williamson  [161]  clearly  state  that  "[...]  the  choice 
of  the  V2,V4  coefficient  is  most  often  motivated  by  empirical  arguments  and  chosen  in  a  somewhat 
arbitrary  manner  [...]."  More  advanced,  and  by  now  classical,  stabilizing  schemes  for  finite  elements, 
spectral  elements,  and  discontinuous  Galerkin  are  described  in  the  following  subsections. 


4.2  Filtering  of  (high-order)  EBG  Methods 

Both  CG  and  DG,  like  all  higher-order  methods,  are  limited  by  Godunov’s  Theorem:  all  linear  numer¬ 
ical  methods  for  solving  PDEs  that  do  not  generate  additional  extrema  (so-called  monotone  schemes), 
are  all  first-order  accurate.  As  an  immediate  consequence,  high-order  CG  and  DG  are  not  monotonic¬ 
ity  preserving,  especially  so  near  sharp  gradients.  In  NWP,  this  problem  is  especially  problematic  for 
tracer  transport,  where  mass  fractions  may  become  negative  due  to  these  artificial  extrema.  Finally, 
spectral  elements  and  discontinuous  Galerkin  methods  on  quadrilateral  and  hexahedral  elements  typ¬ 
ically  use  inexact  integration  to  diagonalize  the  mass  matrix;  this  approximate  integration  introduces 
errors  which  must  be  stabilized  by  a  filter  or  a  more  sophisticated  scheme  such  as  the  VMS  method 
discussed  later. 

To  circumvent  these  problems,  filters  were  introduced  in  the  development  of  both  spectral  methods 
[96]  and,  later,  spectral  elements  in  [32;  94].  They  were  also  applied  to  discontinuous  Galerkin  methods 
in  [116;  118].  Filters  reduce  the  aliasing  that  occurs  in  the  higher-order  modes  of  the  solution  that 
are  largely  responsible  for  Gibbs  oscillations;  hence,  filters  act  upon  the  modal  representation  of  the 
solution.  Once  the  offending  high  modes  are  eliminated,  the  modal  solution  is  inverse  transformed  to 
physical,  or  nodal  space.  Hence,  the  spectral  filtering  operation  consists  of  a  three-step  process:  1) 
transform  the  nodal  solution  to  a  modal  solution,  2)  apply  a  low-pass  filter  to  eliminate  the  largest 
spatial  frequencies,  and  3)  inverse  transform  the  filtered  modal  solution  to  nodal  space.  For  SE  and 
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DG  that  utilize  LGL  basis  functions,  a  modified  Legendre  transform  may  be  utilized  for  steps  1  and 
3;  in  addition,  it  is  possible  to  perform  these  operations  on  an  element  by  element  basis,  thereby 
eliminating  the  need  for  a  global  assembly  operation. 

Ideally,  the  spectral  filter  applied  during  step  2  should  eliminate  all  non-physical  oscillations  while 
faithfully  preserving  the  physics  of  the  solution.  In  practice,  satisfying  both  of  these  requirements  is 
not  possible.  An  effective  filter  was  developed  by  Boyd  [31;  32],  based  on  previous  theoretical  work 
by  Vandeven  [307],  resulting  in  an  erfc-log  filter  for  polynomials  of  order  p.  Letting  9  =  k/p—  1/2,  the 
filter  a(k;p)  is  equal  to  unity  if  9  <  0  and 


<Kfc;p)  =  2erfc 


2  VPO 


—  log  (l  —  402) 

4 P 


(58) 


Eq.  (58)  rapidly  eliminates  all  higher-order  modes,  and  completely  eliminates  the  highest  order  node. 

Most  DG  methods  utilize  exact  integration  and  therefore  may  not  require  filter-based  stabilization. 
It  should  be  kept  in  mind  that  filtering  may  not  be  sufficient  to  stabilize  the  solution;  for  this  reason, 
the  dissipation  schemes  described  so  far  and  later  in  this  section  are  often  considered  as  a  possible 
option  for  high  order  methods  as  well  [44;  45].  Finally,  it  is  difficult  to  derive  idempotent  filters;  when 
the  filter  is  not  idempotent  [172],  the  solution  may  vary  based  on  how  many  times  the  filter  is  applied 
along  the  simulation. 


4.3  Towards  consistent  stabilization  methods 

Streamline  Upwinding  (SU)  [146],  Streamline  Upwind  Petrov- Galerkin  (SUPG)  [38],  Galerkin/ Least- 
Squares  (GLS)  [150],  Galerkin  methods  with  bubble  functions  [35;  9;  37],  or  sub-grid  projection  meth¬ 
ods  [126]  are  some  of  the  most  used  stabilization  techniques  for  finite  elements.  To  bypass  some 
drawbacks  of  streamline-type  schemes  such  as  these,  much  work  was  done  in  the  same  years  on  shock 
capturing,  as  found  in  [170;  171;  132].  The  Taylor-Galerkin  method  [78],  the  Characteristic-Galerkin 
formulation  [241],  and  the  Characteristic-Based  Split  (CBS)  method  [333;  332]  are  more  ways  for 
FE  stabilization  that,  however,  rely  on  a  reasoning  that  has  no  relationship  with  the  methods  we 
are  interested  in  reviewing  in  this  article.  We  mention  them  here  but  we  will  not  delve  into  their 
description. 

Streamline-upwind  (SU).  The  problem  of  isotropic  smearing  of  the  solution  mentioned  above  was 
partially  solved  by  [146]  with  the  construction  of  the  Streamline-upwind  method,  although  the  idea 
of  finite  element  upwinding  can  already  be  found  one  year  earlier  with  the  work  by  [287]  and  then 
continued  in  [288;  289].  With  SU,  stabilization  is  projected  in  the  direction  of  the  flow  only,  as  visible 
from 


bsu=  /  ru  ■  Vtphu  ■  S7qh  dfih .  (59) 

However,  like  HV,  SU  is  not  numerically  consistent  either  in  the  sense  that  no  residual  information  is 
used  to  construct  this  perturbation  term.  The  Streamline-upwind/Petrov-Galerkin  (SUPG)  method 
described  below  is  the  consistent  evolution  of  SU  and  will  be  among  the  most  common  methods  of 
stabilization  of  finite  elements  used  since  its  introduction. 

Streamline-upwind/ Petrov- Galerkin  (SUPG).  The  SUPG  method  was  designed  by  [38]  and  was  later 
generalized  for  multidimensional  problems  by  [151].  It  is  a  consistent  alternative  to  the  HV  approach 
or  to  the  overly  diffusive  SU.  Its  use  has  been  ubiquitous  in  the  solution  of  transport  problems  by 
the  finite  element  method  (e.g.,  [156;  100;  35;  295]).  The  application  of  this  strategy  to  higher-order 
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schemes  was  first  tested  for  spectral  methods  by  Canuto  and  colleagues  in  [43],  [44],  [46],  [45],  and 
later  by  Hughes  and  co-workers  in  [148]  using  non-uniform  rational  B-splines  (NURBS).  SUPG  is  a 
Petrov-Galerkin  method  in  that  it  does  not  assume  that  the  basis  and  test  functions  live  in  the  same 
space.  We  introduce  the  additional  space  \Ph  of  test  functions  wh  defined  by 

9h  =  {wh  :  wh  =  tph  +  TU-S7ifh  :  iph  e  Wh) . 

We  have  the  problem  of  Ending  the  function  qh  £  Wh  such  that 

[  whu  Vqhdnh-  [  whV  ■  (vV  qh)  dfih  =  f  wh  Sh  dflh  VivheVh.  (60) 

J  nh  J  nh  Ji ih 

Some  algebra  and  rearrangement  of  (60)  yields  the  problem  of  finding  qh  £  Wh  such  that 


ij)hu-Vqhd{2h+  [ 
Jf2h 


Galerkin 


vV%l>h-Vqhdnh+bSUPG 


i>h  Sh  dflh 


Galerkin 


\/tph  £  Wh, 


(61) 


where 


bsuPG=  (u  ■  t  [u  •  —  V  •  —  S']  dfih  (62) 

J  n  - - v 

C(  qh)-S 

is  the  consistent  SUPG  stabilizing  term.  In  (62),  u  •  S7qh  —  V  •  (yVqh^  —  S  is  the  residual  of  (50)  and 
t  is  the  stabilization  parameter.  The  definition  of  r  that  yields  a  nodally  exact  SUPG  solution  with 
continuous  piecewise  linear  finite  elements  is  derived  in  [50]  from  ID  analysis.  Its  generalization  to 
multi-dimensional  problems  is  given  by  the  simple  substitution  of  u  with  ||u||,  although,  in  multi- 
dimensions  this  does  not  necessarily  imply  nodal  exactness.  With  respect  to  higher  order  elements, 
a  thorough  analysis  of  r  for  quadratic  elements  is  given  by  [59].  In  the  context  of  VMS,  different 
definitions  exist  for  parameter  r,  some  of  them  are  discussed  in  4.4.  For  a  brief  review  on  SUPG,  the 
reader  is  also  referred  to  [145]  and  the  report  [102]. 


Galerkin/Least-square  (GLS).  A  generalization  of  SUPG  was  obtained  by  [150]  as 

bGLs=  I  [u-Viph-V-(vViph)]T  [u-S7qh-V-(vVqh)-S]  dQh.  (63) 

J  S2' - v - '  ' - v - ' 

£«’h)  £(qh)-S 

In  analogy  with  the  findings  of  [82]  to  stabilize  the  Stokes  equation,  a  sign  change  in  the  Laplace  term 
of  the  stabilizing  term  in  the  perturbed  equation  proved  to  yield  better  stabilization  characteristics 
(more  accurate  results)  than  the  original  generalized  SUPG  (or  GLS)  method  [100].  In  (63),  for  better 
properties,  instead  of  using  the  differential  operator  C,  the  method  should  use  the  negative  part  of 
the  adjoint  C*  of  the  original  operator  C.  We  have  that  the  last  perturbation  term  of  the  original  AD 
equation  should  be 


b=-  I  C*{iph)T  [u ■S7qh-S7-(vS7qh)-S]d(2h, 

J  Q  ' - - ' 


£(qh)-S 


(64) 


where 


C*  =  — u  •  V  —  V  •  (pS7) 


(65) 
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is  the  adjoint  of  C. 

Based  on  what  was  learned  on  stabilization  of  the  scalar  advection-diffusion  equation,  researchers 
in  fluid  dynamics  applied  these  methods  and  their  evolution  to  the  stabilization  of  fluid  problems. 
Thanks  to  the  work  of  [143;  149],  the  methods  that  we  have  just  described  have  been  recognized  to 
belong  to  the  same  family  known  as  the  family  of  Variational  Multiscale  Stabilization,  or  VMS.  The 
VMS  approach  is  summarized  below  for  scalar  advection  problems,  and  will  be  derived  and  discussed 
for  the  Euler  equations  in  Section  4.7. 


4.4  Variational  Multiscale  Stabilization  (VMS) 

In  1995  and  1996,  groups  of  researchers  lead  by  Hughes  [143]  and  Brezzi  [36]  proposed  a  theory  to 
explain  the  reasons  of  instabilities  and  a  new  way  to  attack  the  problem.  They  concluded  that  the  un¬ 
resolved  scales  (the  scales  that  cannot  be  captured  by  the  computational  grid)  are  responsible  for  the 
numerical  instabilities  of  the  Galerkin  solution  of  the  differential  problem.  The  analysis,  that  continues 
with  [155]  and  [149],  forms  the  unifying  theory  of  all  stabilized  finite  element  methods.  According  to 
this  theory,  stabilized  methods  are  subgrid  scale  models  where  the  unresolved  scales  are  intimately 
related  to  the  instabilities  at  the  level  of  the  resolved  scales,  and  thus  should  be  used  in  the  construc¬ 
tion  of  the  stabilization  term.  More  specifically,  in  the  formulation  of  the  discrete  problem,  the  effects 
of  the  unresolved  scales  must  be  introduced  by  modeling  them  on  the  grid.  These  schemes  are  known 
as  Variational  Multiscale  Stabilization  (VMS)  method.  Generally  speaking,  the  stabilization  term  of 
VMS  corresponds  to  b  defined  in  (64). 

VMS  has  been  extensively  applied  to  the  solution  of  the  advection-diffusion/advection-cliffusion- 
reaction  equations  (e.g.  [143;  149;  58;  61;  141]),  and  to  the  solution  of  the  Navier-Stokes  equations 
for  incompressible  flows  (e.g.  [153;  56;  57;  123;  17;  6]).  Recently,  it  was  applied  to  spectral  elements 
in  the  context  of  atmospheric  flows  in  [210;  209].  In  Section  4.7  a  review  of  VMS  for  the  compressible 
Euler  equations  is  found. 

The  multiscale  description  of  the  stabilization  scheme  relies  on  the  splitting  of  the  solution  into  a 
resolved,  qh,  and  a  sub-grid,  unresolved  component,  q,  to  give  q  =  qh  +q.  Let  W  =  Wh  ©IT  be  the 
space  decomposition  such  that  W  completes  Wh  in  W.  This  translates  into  the  decomposition  of  the 
solution  variables  q  =  qh  +  q,  and  of  the  basis  functions  tjj  =  iph  +  pp.  Substituting  the  decomposition 
into  the  general  weak  form  of  Ecp  (50), 

+a^,q^ =  W’’5')  Vip  eW,  (66) 

where  (•,•)  is  the  L2  inner  product  and  a(-,-)  is  a  bilinear  form  that  satisfies 

a{^,q)  =  (ip,uS7q)  +  u(Vpj,Vq), 

and  anticipating  that  we  will  consider  the  quasi-static  approximation  dtq  =  0  [57],  we  obtain: 

+  +  a(iph  +  ip,qh  +  q)  =  (iph  +  i>,S)  Viph  e  Wh,ji>  e  W.  (67) 

By  virtue  of  the  linear  independence  of  iph  and  we  can  first  take  p)  =  0  and  then  iph  =  0  and  find 
the  split  problem: 

(V\^)  +a(iPh,qh)  +  a(iPh,q)  =  Wh,S)  e  Wh  (68a) 

It)  =  (^-s)  v'*/’ e  w. 


(68b) 
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In  the  subgrid  Eq.  (68b)  we  come  back  to  the  original  differential  operator  £  from  Eq.  (50).  We 
assume  that  ip(dK)  =  0  and  q(dK)  =  0,  for  each  element  K  of  the  grid  and,  following  [143],  in  (68a) 
we  integrate  by  parts  the  bilinear  forms  that  depend  on  the  subgrid  scale  and  find: 


+  a^h,qh)  +  (C*^h,q)  =  ^h,S) 


\/iph  e  wh 


(69a) 


+  ($,Cqh)  +  (%l,Cq)  =  (V>,S) 


e  w, 


where  £*  (in  Eq.  (65))  is  the  adjoint  operator  of  £. 


(69b) 


4.5  Approximation  of  the  sub-grid  scales 

The  unresolved  quantity  q  has  not  been  defined  yet.  Eq.  (69b)  is  used  as  the  starting  point  to  approx¬ 
imate  q.  By  re-arranging  the  terms  in  (69b),  the  equation  for  the  subgrid  scales  is  found, 

$,Cm  =  $,R{<h)  V^€lT,  (70) 

where 

R(qh)  =  S-^-C(qh)  (71) 

is  the  residual  of  the  original  equation.  The  strong  form  of  (70)  is  considered  on  each  element  K 

m  =  R(qh ) ,  (72) 

and  r  «  £  ,  an  algebraic  approximation  of  the  inverse  of  the  differential  operator  £  is  defined.  Then 

the  sub-scale  has  the  form 

q  =  r R(qh)  .  (73) 

Expression  (73)  is  plugged  into  equation  (69a),  to  find  the  expression  for  the  VMS  stabilized  Galerkin 
method  as  follows:  Find  qh  6  Wh  such  that 

(^h,^+a^h,qh)  +  (C*(^h),TR(qh))  =  ^h,S)  Wh£Wh.  (74) 

Eq.  (74)  differs  from  Ecp  (66)  by  the  additional  term  that  models  the  subgrid  scales.  The  extra  term 
is  the  viscous-like  contribution  that  stabilizes  the  equation. 

Different  formulations  for  q  are  found  in  the  literature,  some  of  them  are  reviewed  in  4. 5. 1-4. 5. 3.  All 
of  them  depend  on  the  definition  of  the  stabilization  parameter  r.  The  parameter  r  is  a  topic  of 
active  research  still  today,  since  a  general  definition  is  not  known  [168].  This  statement  is  true  for 
all  the  residual-based  stabilization  methods  described  so  far.  The  quantity  r  is  an  intrinsic  time  that 
is  built  as  a  function  of  the  local  Peclet  number  of  the  flow  which,  for  stability,  should  respect  the 
condition  Pe  <  1.  Many  problems  in  atmospheric  CFD  are  advection  dominated,  implying  Pe  1, 
so  that  stabilization  is  indeed  necessary  for  all  the  problems  that  are  of  any  interest  for  atmospheric 
modelers. 
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4-5.1  Approximation  via  Green’s  functions 


This  approach  is  used  by  Hughes  and  collaborators  [143;  149;  144]  to  derive  r.  In  brief,  they  consider 
Eq.  (72)  with  q  =  0  on  dK,  and  the  associated  Green’s  function  problem  for  the  adjoint  operator 


£*g(x,y)  =  fi(x,y)  MxeK 
g  =  0  on  8K  . 


(75) 


Then  a  uniform  element-wise  definition  of  r  is  obtained  as  the  average  value  of  the  exact  element 
Green’s  function 


Wf  I  I  9<'x'y')  dKxdKy  ’ 


(76) 


where  \K\  is  the  measure  (volume/area/length)  of  the  domain.  For  the  one-dimensional  linear  scalar 
advection-cliffusion  equation  there  is  an  analytical  expression  for  the  Green’s  function  and  the  stabi¬ 
lization  parameter  is  computed  from  (76)  to  give 


1  h 

2  |Ju 


^coth(Pe) 


(77) 


for  the  Peclet  number  (54).  In  [147;  38]  the  same  expression  (77)  is  obtained  in  the  context  of  SUPG 
stabilization  by  following  an  error  minimization  criterion.  For  the  purely  advective  case  (u  =  0  and 
Pe  —>  oc),  we  find 


_  1  h 

2  Mil 


(78) 


Instead  of  (76),  Corsini  et  al.  [61]  propose  a  non-uniform  r  on  each  element: 


t(x) 


TTu  [  g(x,y)dKv. 


I K 


Ik 


(79) 


4-5.2  Approximation  via  Fourier  analysis 

The  strategy  of  Codina  et  al.  [58]  is  explained  here  for  the  multidimensional  advection-diffusion  Eq. 
(50).  The  starting  point  is  to  transform  Eq.  (72)  into  the  Fourier  space.  Which  is  interesting  because 

the  differential  operator  transform,  C ,  is  easy  to  invert.  Let’s  call  T  its  inverse,  T  =  (C)  ,  thus  the 

Fourier  transform  of  the  sub-scale  is  approximated  on  each  element  K  as 

IM  =  T(u)R(u),  (80) 


where  u>  is  the  wave  number  and 

7» 


7T  +  ^J 


-i 


Considering  expressions  (80)  and  (73),  the  Plancherel’s  formula  and  the  mean  value  theorem  are 
applied  to  obtain  an  approximated  value  for  the  stabilization  parameter  on  each  element  K  as 


1  h  Pe 

2  ||u||  Pe+  1 


(81) 


For  pure  advection  problems  {v  =  0  and  Pe  — »  oc),  the  stabilization  parameter  becomes  as  in  Ecp  (78). 
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4-5.3  Approximation  via  bubble  functions 


More  options  to  build  the  stabilization  parameter  r  are  found  in  [141]  for  linear,  quadratic,  and  cubic 
elements.  The  space  W  is  made  of  bubble  functions  (see  [9;  37]),  that  are  vanishing  functions  on  the 
boundaries  of  each  element.  The  unresolved  scales,  q,  are  defined  as  a  function  of  the  bubbles  b(x) 
that  are  derived  as  described  in  the  referenced  literature.  Omitting  the  details,  r  is  a  function  of  the 
bubble  as: 


r  = 


b{x)  dx , 


(82) 


that,  once  evaluated,  yields  the  expression  (77)  for  the  parameter  r;  the  same  expression  encountered 
with  the  Green’s  function  approach. 

On  the  steps  of  [141] ,  t  for  spectral  elements  of  arbitrary  order  and  with  unequally  spaced  element 
nodes  was  derived  in  [210].  The  stabilization  parameter  r  is  built  inside  the  element  as  a  function  of 
the  bubbles  on  every  segment  delimited  by  two  consecutive  LGL  points.  The  uneven  spacing  of  the 
element  nodes  is  the  major  difference  with  respect  to  the  definitions  derived  in  previous  studies.  In 
this  case,  the  intrinsic  time  is  non-uniform  along  the  element. 


4.6  Preserving  positivity 

SUPG,  GLS,  and  VMS  are  not  monotonicity  preserving.  This  issue  is  particularly  important  for  the 
simulation  of  tracer  dynamics  in  the  atmosphere.  If  over-  and  undershoots  affect  the  solution  in  the 
proximity  of  strong  gradients,  the  net  mass  balance  of  the  advected  tracers,  will  be  negatively  affected. 
To  overcome  this  issue,  a  controlled  crosswind  discontinuity  capturing  can  be  added  to  the  principal 
stabilization  scheme.  For  example,  the  method  introduced  in  [55]  was  successfully  adapted  to  high- 
order  spectral  elements  in  [210]  for  standard  2D  test  cases  and,  more  recently,  by  [209]  to  support 
positivity  in  the  solution  of  fully  3D  cloud  simulations.  For  comparison,  we  reproduce  Fig.  29  of  [210]  in 
Fig.  15.  In  the  case  of  high-order  SEM,  in  [210]  the  First  Order  Subcell  (FOS)  method  was  introduced. 
FOS  consists  in  lowering  the  high-order  method  to  first  order  in  the  spectral  elements  that  contain  the 
discontinuity  only.  The  FOS  results  are  encouraging,  although  there  is  some  overhead  coming  from 
looping  over  the  linear  sub-elements  within  the  liigh-order  elements  that  contain  the  localized  over- 
ancl  under-shoots. 

Other  positivity  preserving  schemes  such  as  high-order  limiters  for  both  CG  and  DG  are  often 
used  as  well,  as  is  shown  in,  e.g.,  the  report  by  [329]. 


4.7  VMS  stabilization  for  the  Euler  equations 


VMS  for  compressible  flows  appears  in  [251;  252;  222].  A  review  of  residual-based  stabilization  meth¬ 
ods  for  compressible  flows  can  be  found  in  [145].  Recently,  VMS  was  used  to  stabilize  the  FEM 
solution  of  the  Euler  equations  of  atmospheric  non-hydrostatic  flows  in  [212;  213].  VMS  was  derived 
for  discontinuous  Galerkin  as  well  [154]  although  it  has  not  been  applied  to  atmospheric  modeling. 
In  the  following,  we  then  limit  the  analysis  to  continuous  Galerkin  (without  distinguishing  between 
FEM  and  SEM  in  its  derivation.)  For  the  treatment  that  follows,  it  is  convenient  to  express  the  Euler 
equations  (i.e.,  the  inviscid  counterpart  of  system  (1))  in  compact  form  as 


<9q  <9F*(q) 

dt  dxi 


i  =  1,2,3, 


(83) 


where  the  Einstein  summation  on  the  repeated  indices  is  assumed,  where  q  is  the  vector  of  the 
unknowns,  and  F  is  the  vector  of  the  flux  quantities.  Without  compromising  the  generality  of  the 
stabilization  method,  gravity  and  Coriolis  are  here  omitted.  As  usual,  the  problem  consists  in  Ending 
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Fig.  15:  Stable  SEM  solutions  of  the  transport  equations  for  a  sharp  cylinder  that  is  transported  by  a  rising 
thermal  flow  (i.e.  the  tracer  is  driven  by  the  thermal  flow  that  is  modeled  by  the  Euler  equations  of  stratified 
flows.)  Top  left:  filtered  solution  using  the  filter  of  Section  4.2.  Top  right:  2nd-order  artificial  diffusion  without 
discontinuity  capturing.  Bottom  left:  VMS.  Bottom  right:  VMS  with  discontinuity  capturing.  Adapted  from  [210], 
with  permission  by  Elsevier. 


q(x,£)  that  verifies  Eq.  (83)  for  all  (x,t)  6  4?  x  M+.  To  proceed  and  derive  VMS  applied  to  this  set,  we 
write  the  three-dimensional  Euler  equations  in  flux  form  for  the  conservative  variables  q  and  define 
the  advective  system: 


dq 

~dt 


+  Ai(q) 


where 


A*(q) 


<9q 


(84) 


(85) 


are  the  Jacobian  matrices.  As  already  done  in  Sec.  3  for  scalar  problems,  the  variational  form  of  Eq. 
(84)  can  then  be  written  as 


1ph.dq_dnh  + 

at 


*h-A‘^^dnk 


=  0 


Mipb  e  wh. 


(86) 
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As  it  is  done  in  page  40,  the  decompositions  q  =  q/l  +  q  and  rp  =  iph  + 1/>  are  plugged  into  the  variational 
problem  (86),  which  hence  can  be  split  into  the  two  equations 


ip 

<9q 

m 


[  ^.9^_dnh+ [  ^-Aj(q )^dnh 

Jnh  dt  JQh  dxi 

dKm  +  I  iph-  A®(q)  ^  dKm)  =  0  Viph  €  Wh 
OXi  J 


<h-  —  dKm+  f 

dt 

J  Km 

+  A®(q) 

P)  iK’ 

OXi  J 

nel 


xP-R.{qh)dI<m  VxpeW 


_ i  J  Kn 


m=  1 


(87a) 

(87b) 


where 

R  =  “(l  +  Ai(q)^)  (88) 

is  the  residual  operator  of  the  governing  Eq.  (84).  Equation  (87a)  is  solved  numerically  on  the  compu¬ 
tational  grid,  whereas  (87b)  is  the  subgrid  scale  equation  from  which  an  expression  for  q  is  obtained 
and  hence  plugged  back  into  (87a).  Concerning  Ecp  (87a)  for  the  large  scales,  some  assumptions  should 
be  made.  For  details,  see  the  referenced  literature.  In  the  case  of  non-viscous  problems  (i.e.  Euler  equa¬ 
tions),  SUPG  [194],  GLS  [267]  and  VMS  end  up  having  the  same  structure,  unless  the  approximation 
of  the  subgrid  scales,  q,  is  such  that  VMS  differentiates  itself  from  the  other  two  schemes. 


VMS  as  implicit  Large  Eddy  Simulation  (LES):  Without  entering  much  into  this  discussion,  it  is 
important  to  underline  the  fact  that  VMS  is  also  used  as  an  Implicit  Large  Eddy  Simulation  (ILES) 
scheme  that  relies  on  the  variational  projection  of  the  original  equations  rather  than  the  traditional 
hltering.  This  was  first  applied  to  incompressible  turbulent  flow  in  [153;  17].  In  [183;  91;  234],  a 
turbulent  compressible  flow  is  modeled  using  the  VMS  framework  although  the  fine  scales  are  modeled 
by  a  Smagorinsky  model.  Similarly,  this  is  done  in  [60;  199].  In  [306],  a  VMS  formulation  obtained  by 
extension  of  the  Favre  averaging  to  general  projection  operators  is  proposed,  where  no  explicit  subgrid 
modeling  is  presented.  Using  SEM,  VMS  was  used  in  [120]  to  solve  turbulent  incompressible  flows. 


4-7.1  Approximation  of  the  sub-grid  scales 

For  the  Navier-Stokes  equations,  analogously  to  the  advection-diffusion  equation,  the  subgrid  scale  q 
is  computed  from  the  subscale  Ecp  (87b)  and  has  the  general  form  of 

q  =  T  R(q/l),  (89) 

where  r  is  a  diagonal  matrix.  Shakib  et  al.  [267]  and  Hughes  and  Mallet  [151]  compute  the  parameter  r 
for  GLS  to  solve  the  compressible  Euler  and  Navier-Stokes  equations.  For  the  same  equations,  Hughes 
and  Tezduyar  [156]  and  Le  Beau  and  Tezduyar  [194]  compute  r  for  SUPG.  Just  like  for  the  scalar  case, 
the  parameter  r  has  been  derived  in  different  ways  by  different  authors,  although  the  final  expressions 
seldom  differ  greatly.  In,  e.g.,  [222],  r  is  derived  from  a  Fourier  analysis.  Another  approach  involves 
the  use  of  Green’s  functions  as  done  in  [61].  Regardless  of  the  definition  of  r,  let  us  notice  the  local 
nature  of  the  subscales  that  only  exist  where  the  residuals  of  the  large  scales  are  important.  This,  with 
non-constant  values,  marks  the  major  difference  with  respect  to  artificial  diffusion.  The  structure  of 
q  for  the  problem  of  a  rising  thermal  is  shown  in  the  top  two  plots  of  Fig.  16.  By  comparison  with 
the  pattern  of  potential  temperature  (bottom  left  plot)  and  horizontal  velocity  (bottom  right  plot), 
the  structure  of  the  sub-grid  scale  is  clearly  tied  to  the  residual. 

An  example  of  simulation  where  VMS  was  used  to  stabilized  both  the  dynamics  (Euler  equations) 
and  the  advection-diffusion  equations  of  water  tracers  is  shown  in  Fig.  17,  and  is  compared  against 
the  solution  obtained  in  [103]  with  a  hltered  high-order  spectral  element  at  an  equivalent  resolution. 
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Fig.  16:  2D  Rising  thermal  bubble.  Top  row:  sub-grid  scales:  9/9max  (top-left)  and  U  /Umax  (top-right).  Bottom 
row:  potential  temperature  9'  (K)  (left),  and  horizontal  velocity,  u  (m/s).  This  vertically  displacing  flow  is  triggered 
by  the  thermal  perturbation  6'  of  a  neutrally  stratified  flow  (i.e.  uniform  and  constant  9q).  The  characteristic 
shape  of  the  perturbation  field  9'  is  shown  in  the  right  panel.  The  plots  are  adapted  from  [212], 


4.8  Alternative  Consistent  Schemes:  Spectral  Vanishing  Viscosity  and  Entropy  Viscosity  method 

Also  based  on  a  second  order  operator,  the  spectral  vanishing  viscosity  (SVV)  of  [290]  is  a  stabilizing 
method  used  by  practitioners  of  high  order  spectral  element  and  spectral  Fourier  methods.  The  idea 
of  SVV  comes  from  an  entropy  analysis  of  the  problem  at  hand  and  is  such  that  the  added  dissipation 
satisfies  the  entropy  condition.  For  more  on  this,  please,  see  [290;  130;  173]  and  references  therein. 

Also  tied  to  the  entropy  equation,  the  entropy  viscosity  method  was  first  introduced  by  [128].  The 
fundamental  difference  between  this  method  and  SVV  is  in  the  way  the  entropy  equation  is  used. 
The  entropy  viscosity  method  builds  the  local  and  dynamic  viscosity  of  the  equations  based  on  the 
residual  of  the  associated  entropy  equation  [127;  129].  In  [335]  we  find  how  this  regularization  of  the 
governing  equations  is  applied  to  the  discontinuous  Galerkin  method  as  well. 
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Fig.  17:  2D  squall  line  simulation.  The  cloud  content  is  delimited  by  the  thick  black  contour  line  in  both  plots. 
The  color/grey  shading  in  the  left/right  plots  is  the  equivalent  potential  temperature.  Both  horizonthal  domains 
extend  along  240  km.  The  left  figure  is  adapted  from  [213]  where  a  VMS  stabilized  FEM  solution  was  computed 
with  linear  elements  (with  permission  by  Elsevier.)  The  right  figure  is  adapted  from  [103]  (with  permission  of  the 
American  Meteorological  Society)  and  the  solution  was  computed  with  8  -order  SEM  stabilized  with  a  constant 
coefficient  diffusion  ( i /  =  200m2/s)  and  a  filter. 


We  filially  comment  on  the  use  of  adaptive  viscosity  methods  for  both  CG  [125]  and  DG  [236]. 
These  two  methods  are  not  consistent,  but  are  element-based  and  dynamic.  An  adaptive  artificial 
viscosity  for  non-liydrostatic  modeling  using  DG  has  been  recently  proposed  in  [329]. 


4.9  Physics-based  stabilization 

A  computationally  inexpensive  and  numerically  stable  sub-grid  scale  model  for  compressible  large- 
eddy  simulation  was  introduced  in  [229]  for  adaptive  finite  elements.  Due  to  its  stabilizing  properties, 
this  method  was  easily  adapted  to  the  solution  of  low  Mach  number  atmospheric  flows  via  high  order 
spectral  elements  in  [215]  and  [216].  Like  VMS,  this  method  is  a  residual-based  alternative  to  the 
more  classical  artificial  diffusion  in  a  way  that  not  only  is  numerically  consistent,  but  could  also  serve 
as  a  turbulence  model.  Unlike  VMS,  however,  stabilization  is  attacked  starting  from  the  governing 
equations  rather  than  from  their  numerical  approximation.  More  specifically,  the  Euler  equations  are 
first  filtered  to  separate  the  resolved  from  the  un-resolved  scales  [108;  260].  The  filtering  operation 
leads  to  a  new  set  of  equations  containing  additional  terms  that  are  dissipative  in  nature  and  that  are 
then  modeled  in  some  way.  The  steps  described  below  (following  the  treatment  of  [216])  show  how 
stabilization  is  then  achieved.  In  LES,  given  a  quantity  q  (e.g.,  density,  velocity,  potential  temperature), 
its  large  scale  (grid  resolved)  component  q  is  obtained  via  the  application  of  the  filter 

«(x)=  [  GA(x-x)q(x)dX-  (90) 

J  n 

Eq.  (90)  is  a  spatial  convolution  of  the  filtering  function  G'/iwith  q ,  where  A  is  the  filter  width.  The 
hlter  functions  can  vary;  the  most  commonly  used  in  LES  are  the  Gaussian,  the  top  hat  in  real  space, 
and  the  sharp  Fourier  cutoff  functions  [197;  240]. 

Remark  4-1  The  barred  quantities  introduced  in  Section  2.1.2  have  no  relation  with  q  defined  in 
Eq.  (90)  for  Large  Eddy  Simulation  (LES).  LES  implies  a  separation  between  the  resolved  and  unre¬ 
solved  scales,  whereas  the  splitting  given  in  Section  2.1.2  was  introduced  for  numerical  convenience  in 
the  simulation  of  atmospheric  problems  and  does  not  affect  the  way  LES  is  constructed  or  derived. 

For  compressible  flows,  the  Favre  hlter  q  =  ~pq/~p  [92],  although  not  necessary,  is  classically  intro¬ 
duced.  The  application  of  these  filters  to  equations  (6)  -excluding  the  Coriolis  terms  for  simplicity- 
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yield  the  filtered  system 

+  V  •  (pu  ®  u)  +  Vp  =  V  ■  r  —  pg,  (91a) 

g  +  V-(p5)=0,  (91b) 

^  +  V-(pd u)=V-Q,  (91c) 

where  the  two  derivatives  on  the  right-hand  sides  of  (91a)  and  (91c)  represent  the  contributions  of 
the  unresolved  scales.  If  Favre  filtering  were  not  applied,  an  additional  flux  term  would  also  appear 
on  the  right-hand  side  of  Eq.  (91b).  With  Favre,  the  filtered  density  p  is  conserved  and  no  modeling 
is  required  for  the  continuity  equation.  In  (91a),  r  is  the  turbulent  stress  tensor, 


:  p  (u®u-  u®u^)  , 


approximated  by 

t  =  2D(u), 

(92) 

where 

©(5)=  (Vu  + Vut) 

is  the  velocity  deformation  tensor  multiplied  by  a  dynamic  coefficient  p n  that  will  be  defined  shortly. 
Similarly,  in  (91c)  Q  is  the  kinematic  heat  flux  defined  as 


Q  =  p^#u-#uj,  (93) 

and  is  modeled  via 

Q  =  KnVd  (94) 

Like  pn,  the  definition  of  k„  determines  the  method  proposed  in  [229].  The  coefficients  pn  and  k„  are 
calculated  element-wise  on  every  high  order  element  fle  for  a  Galerkin  approximation  of  Equations 
(91).  More  specifically,  for  the  sensible  temperature  T  =  0(p/po)R^Cr‘  and  one  finite/spectral  element 
of  characteristic  length  hoe,  we  start  by  defining  the  dynamic  viscosities 
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(96) 


In  (96)  T  indicates  the  space  average  of  the  quantity  at  hand  over  Q  and  the  norms  ||  •  ||oo,r?  at  the 
denominator  are  used  for  normalization  to  preserve  the  correct  dimension  of  the  resulting  equation. 
Having  /imax  and  pres  constructed,  the  dynamic  coefficients  of  the  viscosity  terms  can  be  computed 
as 

Pn  |  Qe  —  min  1 1  p  1 1  oo ,  Qe  ( h  ma.x  |  J?e  ,  Mres|fie)  (9^) 


and 

Pr 

Kn\ Qe  =  - 7  Pn\ne,  (98) 

7-1 

where  Pr  is  an  artificial  Prandtl  number.  The  residuals  in  (96)  are  simply: 


R(  u) 


dpu 

~dt 


+  V  •  (pu  ®  u)  +  Vp+  pg, 


(99a) 
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Fig.  18:  Stabilized  solution  of  the  density  current  problem  [284].  Reproduced  from  [216]. 


R(p)  =  -7^  +  V-(pu),  (99b) 

R(P)  =  9^+  V '  (^5)  ■  (99c) 

The  time  derivatives  are  to  be  included  or  the  consistency  of  the  method  would  be  lost.  An  example  of 
the  stabilized  spectral  element  solutions  reported  in  [216]  is  plotted  in  Fig.  18,  where  also  the  results 
obtained  using  a  constant  coefficient  Lilly-Smagorinsky  model  [201;  274]  are  given  for  comparison. 
Putting  together  the  moist  problem  briefly  described  in  Section  2.4  and  the  current  LES-based  stabi¬ 
lization,  the  simulation  of  a  fully  three-dimensional  deep  convection  problem  is  reported  in  [215];  in 
Fig.  19,  we  reproduce  Fig.  3  contained  therein. 

The  multi-scale  properties  of  this  scheme  have  been  verified  via  the  simulation  of  a  turbulent  flow 
on  the  sphere  whose  radius  is  that  of  the  earth.  As  an  example,  a  turbulent  flow  in  a  geostrophically 
balanced  atmosphere  is  shown  in  Fig.  20,  after  [216]. 


z  (m) 
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Fig.  19:  Deep  convection:  3D  view  of  qc  (grey  surface),  surface  velocity  (vectors),  and  the  instantaneous  distribution 
of  qr  on  the  ground  (contours).  Reproduced  from  Fig.  3  of  [215]. 


Fig.  20:  Turbulent  flow  on  the  sphere  after  12,  20,  and  25  days.  Top- view,  looking  down  onto  the  northern 
hemisphere.  The  radial  component  of  vorticity  is  plotted  and  colored  by  intensity.  Plot  adapted  from  [216]. 
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Fig.  21:  Representation  of  a  smooth  mountain  using:  (a)  height  coordinate  system  with  step  topography,  (b) 
(T-terrain  following  coordinates,  and  (c)  height  coordinate  system  with  shaved  cells. 


5  Vertical  discretization,  computational  grids,  and  adaptive  mesh  refinement  in  NWP 

We  briefly  discuss  the  issue  of  vertical  discretization  in  atmospheric  models  since  it  is  characterized 
by  some  constraints  that  do  not  apply  to  more  traditional  and  general  CFD  models.  Because  of 
the  classical  use  of  finite  differences  with  Cartesian  rectangular  grids,  the  accurate  approximation 
of  topography  has  always  been  a  major  concern  both  in  atmospheric  and  ocean  models.  The  verti¬ 
cal  coordinate  systems  can  be  separated  into  two  main  branches:  a  terrain-following  [237;  104]  and 
height-coordinates.  Terrain- following  coordinates  have  the  advantage  of  the  accurate  representation 
of  topography  and  ease  of  application  of  boundary  conditions  as  the  grid  cells  follow  the  shape  of 
the  varying  bottom  of  the  domain.  However,  the  large  truncation  errors  that  increase  with  increasing 
topography  slope  [286;  164]  require  vertical  coordinates  that  are  more  suitable  for  steep  topographies. 
The  height-coordinate  system  was  first  proposed  as  the  ?;-system  by  [219].  It  consists  of  the  use  of  a 
rectangular  grid  that  intersects  the  topography  and  defines  the  orographic  height  at  the  cells  edges. 
Modification  of  both  approaches  have  been  later  defined.  Examples  are  the  hybrid  terrain-following 
coordinates  [268]  as  an  improvement  of  a,  or  the  shaved-cell  method  in  z-coordinates  introduced  by 
[1]  for  ocean  models.  Fig.  21  shows  a  schematic  of  these  grids. 

The  a  grid  mentioned  above  is  simple,  but  on  steep  topography  the  regularity  of  the  grid  in  the 
inner  domain  is  compromised.  To  overcome  this  drawback,  [263]  introduced  the  smooth  level  vertical 
(SLEVE)  mapping  that  helps  maintain  a  sufficient  degree  of  regularity  of  the  node  distribution  away 
from  the  bottom  boundary.  Given  a  mountain  ridge,  a  SLEVE  grid  is  obtained  from  the  decomposition 
of  a  large  and  small  scale  variation  of  topography  (e.g.  a  Gaussian  terrain  perturbed  by  a  wave-like 
function).  Through  this  solution  the  grid  distortion  is  controlled  from  bottom  to  top  by  means  of 
two  free  parameters.  Somewhere  between  a  and  SLEVE  stands  the  hybrid  grid  of  [268].  The  hybrid 
grid  uses  the  same  vertical  coordinate  a  and  combines  the  topography  and  the  height  of  the  domain 
through  two  functions  o(cr)  and  b(a)  whose  values  are  properly  tabulated. 

Finite  elements  and  Galerkin  methods  in  general  (finite  volumes  included)  are  free  of  all  the 
drawbacks  of  methods  that  are  not  flexible  with  regard  to  the  grid.  Finite  elements  depend  on  compu¬ 
tational  grids  of  quadrilateral  and  triangular  elements  (in  2D)  or  hexahedra,  tetrahedra,  and  prisms 
(in  3D)  that  adjust  to  the  physical  geometry  to  be  discretized  without  affecting  the  formulation  of  the 
governing  equations.  The  grid  shape  is  inherently  defined  in  the  numerical  formulation  of  the  method. 
Generally  speaking,  they  are  z-coordinate  based  methods  with  full  control  of  the  shape  of  the  to¬ 
pography.  The  grid  itself  looks  like  a  cr-grid,  but  the  fundamental  difference  is  that  finite  difference 
methods  with  a  grids  require  re-expressing  the  equations  using  a  coordinate  transformation. 

Due  to  the  geometrical  flexibility  of  Element-Based  Galerkin  (EBG)  methods,  no  coordinate  trans¬ 
formation  is  needed  to  apply  the  ground  boundary  condition.  Complex  orography  can  be  modeled  with 
ease  using  finer  or,  perhaps,  adaptive  grids  (see  Section  5.2),  as  long  as  certain  criteria  on  regularity 
and  smoothness  of  the  element  shape  are  respected.  Furthermore,  in  a  time  when  high  resolution  is  the 
rule,  complex  orography  can  be  modeled  with  ease  and  better  grids.  High  resolution  terrain-following 
coordinates  induce  grids  to  lose  the  property  of  orthogonality  at  the  boundaries.  The  internal  elements 
as  well  would  suffer  from  great  stretching  up  to  a  point  that  the  grid  is  no  longer  sufficiently  smooth 
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for  the  numerical  method  to  perform  correctly.  For  example,  if  the  Jacobian  of  the  transformation 
from  physical  to  computational  space  is  singular,  large  numerical  errors  and  instability  in  the  solution 
would  occur  [300].  The  application  of  CFD  grid  generation  techniques  for  use  in  atmospheric  problems 
is  being  considered  more  and  more.  Simple  and  fast  structured  grid  generation  with  boundary  layer 
grids  or  elliptic  smoothing  has  been  described  in,  e.g.,  [208].  Unstructured  grids  are  also  becoming  of 
interest,  as  shown  a  few  years  ago  by  [5]  and,  more  recently,  by  [275].  However,  the  inertia  from  the 
atmospheric  community  towards  grids  that  do  not  have  a  characteristic  column-wise  structure  is  still 
large.  This  is  because  all  of  the  packages  that  involve  the  computation  of  atmospheric  parameteriza- 
tions  (e.g.,  precipitation,  radiation)  are  designed  to  work  on  such  grids  and  would  have  to  be  adapted 
(i.e.  re-written)  to  work  on  different  grids.  The  reasoning  behind  this  inertia  is  understandable,  al¬ 
though  steps  ahead  in  this  direction  must  be  made  now  that  high-resolution  atmospheric  modeling  is 
approaching  fast. 


5.1  3D  grid  generation  for  domains  with  orography  and  bathymetry 

Volume  grid  generation  in  atmospheric  models  is  commonly  performed  by  a  one  directional  simpli¬ 
fication  of  Transfinite  interpolation  (TFI)  [122;  89].  TFI  is  robust,  simple,  and  arguably  the  fastest 
grid  generation  technique  in  use  in  many  fields  of  computational  mechanics,  of  which  geophysical  fluid 
dynamics  represents  a  particular  case.  Nevertheless,  generally  the  quality  of  TFI  grids  degenerates 
when  the  geometric  features  of  the  domain  boundaries  present  sharp  corners,  quasi- vertical  boundary 
walls,  or  similar  characteristics.  This  has  a  direct  effect  on  the  quality  of  the  numerical  solution  of  the 
problem  [217].  The  problem  exists  regardless  of  the  underlying  numerical  method  of  solution.  In  NWP, 
sharp  mountain  ridges  and  canyons  are  an  example.  With  the  ever  increasing  trend  towards  high  spa¬ 
tial  resolution  that  we  are  experiencing  in  numerical  weather  prediction  today,  sharp  topographies  are 
certainly  an  issue.  In  the  following  sections,  we  describe  the  current  way  of  generating  structured  grids 
in  topographical  domains  and  present  a  few  examples  to  underline  the  possible  limitations.  At  that 
point,  we  introduce  the  idea  behind  elliptic  grid  generation  and  how  grids  may  be  improved  in  terms 
of  smoothness  and  orthogonality  properties  by  this  simple  technique.  Most  of  the  ideas  presented  in 
this  appendix  are  found  in  the  books  by  [182]  and  [298],  and  in  the  recent  paper  by  [175]. 


5.1.1  Algebraic  grid  generation 

As  we  have  mentioned  above,  transfinite  interpolation  has  a  major  drawback  that  comes  from  the 
constraint  on  the  regularity  of  the  boundaries.  If  the  boundaries  of  the  simply-connected  domain  are 
not  sufficiently  smooth,  TFI  fails  to  generate  good  grids.  The  sharpness  of  internal  corners  given  by  a 
possible  discontinuity  in  the  space  derivative  of  the  boundary  functions,  reflects  into  folding  grids  with 
unacceptable  node  overlapping.  The  problem  of  folding  grids  with  difficult  geometries  is  usually  solved 
by  subdividing  the  domain  into  smaller  subdomains  with  more  regular  boundaries.  This  technique  is 
robust  but  difficult  to  automate.  In  Fig.  23,  although  the  edges  do  not  cross,  the  vertical  wall  on  the 
left-hand  side  of  the  hill  is  a  challenge  for  the  grid  generator,  as  it  can  be  noted  by  the  extremely 
stretched  elements  in  the  region  of  the  hill’s  front. 

Nonetheless,  because  topography  is  usually  smooth  in  current  operational  models  (at  horizontal 
resolution  of  1  km  or  coarser,  all  mountain  peaks  are  likely  to  be  smoothed  out),  TFI  is  still  the  perfect 
and  quick  solution  that  can  be  properly  modified  for  different  types  of  vertical  node  distributions. 

These  improved  methods  are  sufficiently  good  as  long  as  the  boundaries  are  never  vertical.  This 
is  because  the  transformations  are  performed  along  2  only.  For  full  control  of  the  nodes’  distribution 
in  all  directions,  these  schemes  should  be  incorporated  into  a  full  TFI  interpolation. 
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5.1.2  Elliptic  grid  generation 

One  simple,  yet  efficient  solution  to  the  generation  of  smooth  grids  with  sufficiently  good  properties 
is  given  by  the  solution  of  the  Thompson-Thames-Mastin  (TTM)  problem  [297],  an  elliptic  system 
of  partial  differential  equations.  Two-dimensional  elliptic  grid  generation  was  introduced  for  ocean 
circulation  modeling  in  [259].  The  penalty  of  elliptic  equation  methods  is  the  higher  cost  with  respect 
to  algebraic  methods.  To  control  the  point  distribution  with  TTM  some  parameters  must  be  selected  by 
the  user.  To  overcome  the  need  for  parameter  selection,  in  [175]  an  automatic  elliptic  grid  generation 
method  is  proposed.  A  similar  approach  is  described  in  [180]  for  grids  around  topography.  In  this 
recent  paper,  the  author  also  uses  an  iterative  method  to  smooth  the  grid  on  a  layer-by-layer  basis, 
with  a  check  on  the  grid  spacing  to  avoid  the  overlapping  of  grid  cells.  This  check  is  necessary  because 
the  method  of  [180],  unlike  the  elliptic  scheme,  is  not  designed  to  respect  the  maximum  principle. 

Orthogonality  When  it  comes  to  high  resolution  simulations,  with  very  fine  LES  grids,  the  boundary 
layer  may  be  solved  explicitly.  In  this  case,  boundary  layer  grids  may  be  necessary  for  atmospheric 
models  like  they  are  for,  e.g.,  industrial  flows  at  much  smaller  scales.  For  how  the  atmospheric  com¬ 
munity  is  responding  to  the  introduction  of  new  grids,  orthogonal  boundary  grids  may  still  be  seen 
as  futuristic.  However,  their  use  is  already  common  in  the  simulation  of  atmospheric  flows  in  the 
micro-scale  (i.e.  20  to  500  m  domains)  (see,  e.g.,  the  Bolund  experiment  starting  from  [22])  so  that  it 
seems  appropriate  to  mention  them  here. 

Orthogonality  in  three-dimensional  structured  grid  generation  systems  is  still  an  active  held  of 
work  (see  [298;  175]).  The  elliptic  grid  generation  system  herein  described  and  implemented  in  [221] 
is  able  to  reach  reasonable  orthogonality  properties  at  the  lower  boundary  by  either  using  Neumann 
boundary  conditions  to  move  the  nodes,  or  by  a  proper  definition  of  the  control  functions  as  done 
in  [175].  Currently,  a  quasi-orthogonal  system  is  the  best  that  we  can  achieve  with  the  available 
algorithms  from  the  literature.  Fig.  22  shows  how  a  non-orthogonal  grid  is  transformed  to  a  quasi- 
orthogonal  mesh  in  the  proximity  of  the  boundary.  This  grid  was  deliberately  relaxed  to  the  point 
where  the  boundary  layer  is  completely  lost.  This  was  done  to  clearly  show  orthogonality  at  the 
boundary.  However,  maintaining  a  proper  stretching  ratio  in  the  proximity  of  the  boundary  without 
affecting  orthogonality  remains  an  open  problem.  A  compromise  is  needed  in  building  the  grid,  and 
experimentation  on  different  topographies  may  be  necessary. 

We  report  a  few  two-  and  three-dimensional  examples  of  grids  generated  using  both  algebraic 
and  elliptic  methods.  Fig.  22  shows  the  computational  grid  around  a  cosine  function  obtained  by 
TFI  interpolation,  TFI  with  an  orthogonal  multi-surface  method,  and  with  an  elliptic  grid  generator. 
The  geometry  is  straightforward  to  mesh.  The  three  methods  give  similar  results;  however,  using  the 
elliptic  method  together  with  a  multi-surface  technique  to  achieve  orthogonality  clearly  produces  a 
better  boundary  layer  grid.  The  problem  is  taken  a  little  further  with  the  fully  three-dimensional  mesh 
of  the  Bolund  hill  in  Denmark.  The  improvement  in  terms  of  regularity  of  the  grid  in  the  internal 
volume  and  in  terms  of  quasi-orthogonality,  is  evident  from  panel  (b)  in  Fig.  23,  where  the  elliptic 
solver  was  applied  with  a  few  iterations  to  improve  the  algebraic  grid  of  panel  (a). 


5.2  Adaptive  mesh  refinement 

The  term  adaptive  mesh  refinement  (AMR)  describes  mesh  generation  techniques  in  which  the  spatial 
resolution  is  adjusted  depending  on  certain  properties  of  the  specific  application.  Within  AMR  one 
distinguishes  between  static  and  dynamic  AMR.  In  static  AMR  the  mesh  is  adjusted  once  at  the 
beginning  of  the  simulation  whereas  dynamic  AMR  adapts  the  resolution  for  the  whole  duration  of 
the  simulation  as  a  function  of  the  structure  of  the  solution  based  on  some  pre-dehned  criterion. 

The  idea  to  increase  the  resolution  in  part  of  the  domain  has  a  long  history  in  scientific  computing 
and  also  in  meteorology.  Usually  this  adjustment  of  the  resolution  is  done  by  coupling  two  numerical 
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Fig.  22:  (a)  TFI,  (b)  orthogonal,  (c)  elliptic  (not  orthogonal). 


Fig.  23:  (a)  TFI  and  (b)  elliptic  volume  grids.  In  this  plot  there  is  no  grid  control  in  the  proximity  of  the  boundary 
surface.  The  elliptic  grid  is  computed  with  50  iterations. 


models  with  different  resolutions  (so  called  nesting).  The  easiest  way  to  implement  nesting  is  to  first 
run  a  full  coarse  simulation  and  then  use  the  result  of  this  coarse  simulation  as  boundary  conditions  for 
a  higher  resolved  simulation  in  a  smaller  domain.  In  this  approach  the  result  of  the  higher  resolution 
simulation  cannot  affect  the  coarse  simulation.  For  this  reason  this  approach  is  called  one-way  nesting. 
An  example  for  one-way  nesting  can  be  found  in  the  1976  work  by  Davies  [72]  and  the  work  of 
Miyakoda  and  Rosati  in  1977  [220].  More  difficult  but  also  more  accurate  is  two-way  nesting  in  which 
both  numerical  models  are  allowed  to  interact  which  each  other.  This  means  that  the  result  on  the 
coarse  mesh  is  not  only  used  as  an  initial  and  boundary  condition  of  the  finer  mesh  but  the  result 
on  the  finer  mesh  is  also  used  to  improve  the  accuracy  of  the  simulation  using  the  coarse  mesh.  An 
example  of  two-way  nesting  is  the  work  by  Zhang  et  al.  [330].  Nesting  does  not  need  to  be  static. 
The  domain  of  the  higher  resolution  simulation  can  move  within  the  domain  of  the  coarse  resolution 
simulation  like  Ley  and  Elsberry  did  in  1976  [200].  Nesting  is  not  restricted  to  combining  two  different 
simulations.  More  than  two  different  resolutions  can  be  combined,  as  in  Ginis  et  al.[109]. 

An  alternative  to  increasing  the  spatial  resolution  via  nesting  is  to  use  variable  mesh  spacing  in 
the  different  directions  like  in  Staniforth  and  Mitchell  [280],  or  to  adjust  the  mesh  with  the  help  of  a 
coordinate  transformation,  as  in  Dietachmeyer  and  Droegemeier  [75]. 

Dynamic  AMR  in  which  the  mesh  is  repeatedly  adjusted  according  to  the  current  intermediate 
result  of  the  simulation  has  been  used  in  engineering  applications  for  a  long  time  [24;  23].  The  first 
application  of  this  kind  of  dynamic  AMR  in  atmospheric  sciences  was  done  by  Skamarock  et  al.  in 
1989  [272]  and  Skamarock  and  Klemp  in  1993  [270].  A  first  approach  to  use  dynamic  adaptive  mesh 
refinement  operationally  was  given  by  the  OMEGA  model  (OMEGA  stands  for  Operational  Multiscale 
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Environment  Model  with  Grid  Adaptivity).  The  OMEGA  model  was  presented  in  the  work  of  Bacon 
et  al.  in  2000  [8].  Simulations  of  hurricane  tracks  by  Gopalakrishnan  et  al.  in  2002  [121]  demonstrate 
that  the  accuracy  of  the  hurricane  simulation  can  be  improved  significantly  by  using  dynamic  AMR 
while  at  the  same  time  reducing  the  runtime  of  the  simulation.  There  are  however  still  many  open 
questions  that  need  to  be  resolved  for  a  broader  application  of  dynamic  AMR  in  atmospheric  sciences 
[315].  More  details  about  the  historical  evolution  of  AMR  can  be  found  in  [18]  and  [159]. 

Within  dynamic  mesh  refinement  there  are  three  possible  approaches  to  adjust  the  accuracy  of 
the  simulation  according  to  the  current  flow: 

—  li-adaptive  mesh  refinement:  the  spatial  resolution  is  adjusted  by  adding  or  removing  grid  points  in 
the  mesh.  In  an  element-based  method  this  is  done  by  subdividing  elements  into  smaller  elements 
or  merging  elements  into  larger  elements.  The  total  number  of  elements  and  grid  points  is  allowed 
to  change  in  this  approach.  This  makes  it  necessary  to  redistribute  elements  sometimes  when 
multiple  computing  nodes  are  used  for  the  computation.  We  discuss  this  approach  more  in  detail 
below. 

—  r-adaptive  mesh  refinement  (or  moving  mesh):  the  grid  points  and  therefore  elements  are  moved 
and  deformed  in  such  a  way  that  the  spatial  resolution  gets  finer  in  those  parts  of  the  domain 
where  the  accuracy  of  the  simulation  needs  to  be  increased.  This  reduces  automatically  the  density 
of  grid  points  in  other  parts  of  the  domain  and  therefore  reduces  accuracy  in  those  parts.  The  total 
number  of  grid  points  and  elements  is  constant  in  this  approach.  An  example  of  this  approach  can 
be  found  in  the  work  of  Kiihnlein  et  al.  [188],  Budd  and  Williams  [40],  and  Bauer  et  al.  [16]. 

—  p-adaptive  mesh  refinement:  the  accuracy  of  the  simulation  is  adjusted  by  changing  the  polynomial 
order  of  the  spatial  discretization.  The  size  and  location  of  the  elements  remains  unchanged  in 
this  approach.  One  of  the  first  descriptions  of  this  approach  can  be  found  in  the  work  of  Babuska 
et  al.  in  1981  [7].  Application  of  this  approach  to  geophysical  modeling  can  be  found  recently  in 
[303]  and,  earlier  on  in  [90]. 

These  three  approaches  for  dynamic  AMR  can  be  combined  with  each  other  like  in  the  work  by 
Lang  et  al.  [190],  Pigott  et  al.  [239],  and  [111]  and  references  therein. 

We  concentrate  in  the  following  on  h-adaptive  mesh  refinement  more  in  detail.  Within  this  ap¬ 
proach  we  can  distinguish  between  conforming  AMR  and  non-conforming  AMR.  A  rising  warm  air 
bubble  simulation  using  conforming  and  non-conforming  AMR  is  shown  in  Fig.  24. 


5.3  Non-conforming  mesh  refinement 

As  explained  in  the  section  above,  mesh  rehnement  techniques  create  either  conforming  or  non- 
conforming  meshes.  In  conforming  meshes  each  element  has  only  one  neighbor  per  element  face 
(ft-conforming),  so  a  situation  where  more  than  two  elements  share  the  same  face  is  not  allowed. 
Also,  the  elements  have  to  be  p-conforming,  that  is  the  approximating  polynomials  in  both  neighbor¬ 
ing  elements  are  of  the  same  order  and  the  nodal  points  on  both  sides  of  the  face  coincide.  An  example 
of  a  ftp-conforming  element  interface  is  shown  in  Fig.  25,  where  elements  A  and  B  exclusively  share 
the  same  face  (side)  and  have  the  same  polynomial  expansion  order.  In  such  a  situation  there  are  no 
additional  requirements  for  the  numerical  method.  The  burden  of  creating  a  conforming  mesh  after 
the  refinement  falls  entirely  on  the  AMR  algorithm. 

In  non-conforming  meshes,  however,  one  needs  to  account  for  faces  that  are  shared  by  more 
than  two  elements  (ft- non-conforming,  see  interface  between  elements  C,  D,  and  E  in  Fig.  25)  or 
with  different  polynomial  approximation  on  both  sides  (p-non-conforming  interface  between  elements 
B  and  C  in  Fig.  25).  Mind  that  Fig.  25  does  not  illustrate  all  the  possibilities  of  non-conforming 
configurations.  One  other  possibility  is  a  ftp-non-conforming  interface,  where  an  edge  (or  face)  is 
shared  by  more  than  two  elements  of  different  polynomial  orders.  To  complete  the  discussion  of 
meshes  and  element-based  Galerkin  methods,  in  this  section  we  provide  an  overview  of  methods  used 
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Fig.  24:  Conforming  (left)  and  non-conforming  (right)  AMR  simulations  of  a  rising  thermal  bubble.  The  left  figure 
is  adapted  from  [224].  The  right  one  is  adapted  from  [184]. 


A 

B 

Fig.  25:  Different  element  interfaces,  where  A-B  is  conforming,  B-C  p-non-con forming,  and  C-D-E  //.-non- 
con  (or  mi  rig.  Dashed  lines  symbolize  the  higher  order  mesh  within  the  elements. 


to  reconcile  non-conforming  elements.  Note  that  conforming  AMR  grids  do  not  require  any  special 
handling  by  the  numerical  method  and  so  we  will  discuss  conforming  AMR  no  further,  but  rather, 
shall  concentrate  on  non-conforming  AMR. 


5.3.1  Mortar  element  method 

The  first  to  introduce  a  non-conforming  formulation  for  spectral  element  methods  were  Maday,  Mar- 
viplis  and  Patera  [205],  who  presented  the  mortar  element  method  (MEM),  where  the  domain  is  split 
into  blocks  of  conforming  elements,  and  a  new  trace  space,  namely  mortars,  is  introduced  to  couple 
the  non-conforming  blocks.  The  MEM  was  an  extension  of  classical  non-conforming  methods  in  the 
finite  element  community  [285;  51;  81]  with  the  difference  that,  besides  being  applied  to  the  spectral 
element  method,  it  did  not  rely  on  Lagrange  multipliers  or  master-slave  relations  of  non-conforming 
edges  of  the  elements. 

In  MEM,  the  mortars  are  one- dimensional  constructs  (in  2D;  they  are  two-dimensional  in  3D)  with 
a  polynomial  space  defined  on  them.  The  task  of  the  mortar  is  to  reconcile  the  C°  continuity  condition 
between  the  non-conforming  elements  that  the  mortar  is  connecting.  In  other  words,  the  mortar  is 
an  interface  between  the  non-conforming  element  faces  (see  Fig.  26).  The  polynomial  order  on  the 
mortar  is  typically  chosen  to  match  the  highest  order  expansion  among  the  elements  contributing 
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Fig.  26:  Schematic  of  mortar  element  method.  The  mortars  are  binding  non-conforming  elements  that  sum  the 
contribution  from  element  edges  and  apply  an  L2  projection  of  the  mortar  data  back  to  the  element  edges.  A 
single  arrow  represents  direct  assignment  of  the  vertex  value,  while  double  arrow  represents  L2  projection. 


to  the  mortar.  The  end-point  values  of  the  mortar  solutions  are  constrained  to  match  the  values  at 
corresponding  vertices  of  the  original  elements  (represented  by  single  arrows  in  Fig.  26).  The  integral 
projection  operation  ( L2  projection)  is  defined  to  project  the  solution  from  the  mortar  to  the  interior 
points  of  the  non-conforming  element  edges  (double  arrows  in  Fig.  26).  If  we  write  this  operation  in 
matrix  form  as  Qnxm ,  where  n  is  the  number  of  nodes  on  the  element  edge,  and  m  is  the  number 
of  nodal  points  on  the  mortar,  then  the  operation  QT  will  sum  the  contributions  from  the  element 
edges  to  the  mortar.  To  reconcile  the  C°  continuity  condition  we  first  sum  the  contributions  from 
element  edges  on  the  mortar  ( QT ),  perform  weighted  averaging,  and  project  the  result  back  to  the 
element  edges  (Q).  Mind  that  this  method  only  minimizes  the  discontinuity  but  does  not  enforce  a 
strict  C°  continuity.  Even  though  here  we  present  only  a  limited  spectrum  of  possible  non-conforming 
configurations,  MEM  is  very  general  and  can  be  applied  in  more  complicated  situations  [205]. 


5.3.2  Pointwise-matching  method 

Another  method,  stemming  from  the  finite  element  community,  is  the  pointwise  matching  method 
(PMM),  or  the  interpolation-based  method  [93;  254;  47].  In  this  approach  both  h  and  p-non-conforming 
elements  are  allowed,  however  it  is  assumed  that  for  /i-non-conforming  elements  there  is  one  parent 
edge  on  one  side  of  the  interface,  and  two  children  edges  on  the  other  side.  I11  the  case  of  the  p-non- 
conforming,  the  parent  edge  is  the  one  with  lower  polynomial  order. 


D 


E 


Fig.  27:  Schematic  of  the  pointwise-matching  method.  Parent  points  are  marked  with  filled  circles.  Values  at 
children  points  depend  on  the  values  of  the  parent  points.  Here  operation  Q  marks  the  interpolation  from  the 
parent  element  to  children  elements. 
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In  Fig.  27  the  parent  edge  belongs  to  element  C  and  the  parent  points  are  marked  with  filled 
circles.  The  values  at  the  child  points  are  interpolated  from  the  parent  points.  Here  Q  denotes  this 
interpolation.  To  ensure  strict  C°  continuity,  the  solution  from  the  child  points  is  first  added  to  the 
parent  points  via  the  operation  QT .  The  solution  is  averaged  at  the  parent  edge  and  interpolated 
onto  the  child  edge  via  Q.  We  mark  the  corner  points  of  children  elements  with  a  filled  circle,  as 
the  interpolation  between  those  points  and  corresponding  points  at  the  parent  edge  is  trivial.  Unlike 
MEM,  the  continuity  here  is  strictly  enforced. 

It  is  possible  to  express  PMM  using  mortars  and  hence  bring  those  two  methods  into  one  frame¬ 
work.  This  can  be  achieved  by  using  the  mortar  infrastructure  and  replacing  the  L-2  projection  by 
interpolation.  In  such  an  approach,  the  choice  of  Q  (projection  or  interpolation)  will  define  the  method. 
Traditionally,  in  MEM  we  choose  the  size  of  mortars  to  correspond  to  short  (children)  edges,  while 
in  PMM  we  use  the  parent  edge  as  a  mortar  analogue.  [265]  investigates  how  different  choices  of  the 
size  of  mortars  affects  the  performance  of  both  PMM  and  MEM. 

5.3.3  Mortar  elements  for  DG  and  application  to  atmospheric  simulations 

The  early  work  on  MEM  focused  mainly  on  elliptic  problems  and  spectral  element  methods  [205;  25]. 
Ivopriva  [186]  applied  the  MEM  to  compressible  flows  and  the  DG  method  by  imposing  an  additional 
condition  on  the  global  conservation  of  the  mortar  approximation,  as  well  as  outflow  conditions.  The 
outflow  condition  required  that  the  solution  from  the  "upwind  side"  of  the  mortar  after  projection 
onto  the  mortar  and  back  to  the  face  remains  unchanged.  This  work  was  used  later  for  atmospheric 
simulations  in  [33;  184].  Based  on  the  two-dimensional  work  reported  in  [184],  Marras  et  al.  [211]  used 
MEM  in  a  unified  CG/DG  shallow  water  model  on  the  sphere  (with  static  and  dynamic  adaptivity.) 
An  application  of  the  pointwise-matching  scheme  to  geophysical  simulations  can  be  found  in  [258]. 
In  [185],  Ivopera  and  Giralclo  presented  a  unified  framwework  including  both  CG  and  DG  methods, 
as  well  as  integral  projection  (for  DG)  and  pointwise-matching  (for  CG)  schemes  for  non-conforming 
interfaces  and  found  that  similar  mass  conservation  properties  can  be  obtained  for  both  configurations. 

5.3.4  Unified  CG/DG  non- conforming  method 

The  idea  of  a  unified  CG/DG  method  stems  from  the  similarity  of  both  approaches.  Much  of  the 
mathematical  operations,  and  therefore  much  of  the  code  implementation  is  the  same,  with  an  excep¬ 
tion  of  communication  between  elements.  A  time-step  of  a  unified  CG/DG  method  can  be  described 
by  the  following  algorithm. 

1.  Evaluate  volume  integrals  for  each  element  and  store  as  right-hand-side  RHS. 

2.  Perform  inter-element  communication: 

For  CG,  perform  Direct  Stiffness  Summation  on  RHS  to  ensure  C°  continuity, 

For  DG,  evaluate  element-boundary  integrals  (fluxes)  and  update  RHS. 

3.  Divide  by  the  mass  matrix.  Notice  that  the  mass  matrix  for  CG  corresponds  to  the  assembled  DG 
mass  matrix. 

4.  Perform  the  time-step  =  RHS. 

This  recipe  can  be  applied  regardless  of  whether  one  constructs  a  conforming  or  non-conforming 
method.  The  non-conforming  element  treatment  will  affect  only  the  inter-element  communication 
step.  Even  though  this  step  is  different  between  the  methods,  it  is  desirable  to  construct  both  non- 
conforming  edge  algorithms  in  a  similar  fashion.  As  discussed  in  previous  sections,  using  the  mortar 
element  method  one  can  incorporate  the  integral  projection  method  by  [186]  as  well  as  the  pointwise- 
matching  method  into  the  same  framework.  The  implementation  of  a  unified  CG/DG  method  with 
non- conforming  interfaces  is  described  in  detail  in  [184;  185].  Here  we  outline  the  general  approach  to 
both  CG  (using  pointwise-matching  method)  and  DG  (using  integral  projection  method)  treatment 
of  non-conforming  interfaces. 
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Fig.  28:  Schematic  of  the  mortar  based  non-conforming  methods  for  both  CG  (a)  and  DG  (b).  Both  methods  follow 
the  same  algorithm  of  communicating  data  to  the  mortar,  performing  operations  on  the  mortar  and  communicating 
the  data  back.  The  differences  lie  in  the  choice  of  mortars  (mortar  conforming  with  the  long  edge  plus  an  additional 
point  mortar  to  communicate  between  vertex  neighbors  for  CG,  mortars  conforming  with  short  edges  for  DG) 
and  matrices  used  to  communicate  data  to  and  from  the  non-conforming  mortar.  In  the  case  of  CG,  the  data  is 
interpolated  using  matrices  J^2  a|iri  -h  .2-  for  DG  we  use  projection  matrices  P*2  and  Pf  2- 

Fig.  28  shows  a  schematic  of  the  unified  approach  to  inter-element  communication  of  CG  and 
DG  methods  for  conforming  and  non- conforming  interfaces.  Panel  (a)  shows  the  interpolation-based 
mortar  point  wise  matching  method  used  for  CG,  while  panel  (b)  presents  the  integral  projection 
method  for  DG.  In  both  situations  the  solution  from  non-conforming  edges  is  first  communicated 
onto  mortars,  then  an  appropriate  action  is  performed  on  the  conforming  mortars  and  the  result  is 
communicated  back  to  the  element  edges.  The  first  difference  between  the  two  approaches  is  the  choice 
of  the  mortar.  For  the  CG  method  we  choose  the  mortar  to  be  conforming  with  the  longer  parent  edge 
(so  called  long  rule  [265]),  while  for  the  DG  method  we  choose  the  shorter,  children  edges  to  define 
the  size  of  the  mortar  ( short  rule).  Additionally,  for  the  CG  method  we  need  to  define  an  additional, 
point  mortar  to  ensure  the  communication  between  vertex  neighboring  elements. 

The  second  difference  between  the  two  methods  is  the  choice  of  non-conforming  communication 
matrices.  For  CG  we  perform  an  interpolation  with  the  matrix  J/-: 

Jk,ij  =  hi(£,k(zj)),  fc  =  1,2, 

where  hi  are  the  basis  functions  defined  on  the  mortar,  is  a  map  from  the  element  edge  coordinate 
z  to  the  mortar  coordinate  £,  and  zj  is  the  coordinate  of  the  j- th  nodal  point  at  the  element  edge. 
The  interpolation  matrices  J *.  will  scatter  the  solution  from  the  mortar  to  two  children  element  edges. 
The  opposite  action,  gather,  is  performed  by  the  transpose  matrices  JT .  The  matrices  J \  correspond 
to  the  operation  Q  showed  in  Fig.  27. 

For  the  DG  method  the  communication  from  parent  element  (long  edge)  to  mortars  requires 
projection  matrices,  defined  in  [186]  by  imposing  the  integral  condition  on  a  non-conforming  interface 

J  (<&(0  -  q(Hk(z)))mdZ  =  0,  k  =  1,2 

where  g^(£)  is  the  solution  projected  at  the  fc-tli  mortar,  q(£k(z))  is  the  solution  at  the  parent  edge, 
£  is  the  coordinate  defined  on  the  mortar,  z  is  the  coordinate  of  the  parent  edge  and  (z)  is  the  map 
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between  the  parent  edge  and  k- th  mortar  coordinates.  This  integral  condition  can  be  expressed  in 
matrix  form  as: 

MqkM~Skq  =  0, 

where  My  =  and  Skj  =  J J1  1  The  projection  operation  is  defined  as 

follows 

qkM  =  M-1Skq=P%q. 

P?  is  the  scatter  projection  matrix  used  to  communicate  the  solution  from  parent  edge  to  a  mortar. 
The  communication  in  the  opposite  direction  is  achieved  by  a  gather  projection  matrix 

<l  =  J2PkqM' 

k= l 

where  P®  =  \M~1(ySk)T .  The  details  are  presented  in  [186;  184]. 

Finally,  the  third  difference  between  inter-element  communication  of  CG  and  DG  methods  lies  in 
the  action  performed  on  the  mortar.  Once  the  solutions  from  element  edges  are  interpolated/projected 
onto  mortars,  the  CG  method  averages  the  solution  before  scattering  it  back  to  respective  edges,  while 
in  the  DG  method  we  need  to  compute  a  flux,  which  is  then  projected  back  to  the  elements.  This  action 
is  identical  between  conforming  and  non- conforming  edges,  as  mortars  are  by  design  conforming  to  each 
other.  Note  that  whenever  the  mortar  and  element  edge  spaces  are  conforming,  the  communication 
matrix  (regardless  of  whether  it  is  interpolation  or  projection)  is  always  the  identity  matrix.  It  is  also 
important  to  note  that  while  Fig.  28  presents  schematics  for  both  methods,  the  actual  implementation 
in  the  code  may  differ.  Details  of  CG  interpolation  method  implementation  is  presented  in  [185],  along 
with  a  unified  implementation  of  the  CG/DG  method. 

5.3.5  Performance  of  the  non- conforming  adaptive  mesh  refinement 

Adaptive  mesh  refinement  and  element-based  Galerkin  methods  are  a  natural  match  due  to  the 
element-local  nature  of  most  computationally  intensive  calculations.  The  edge-based  computations 
can  also  be  implemented  efficiently  even  for  non-conforming  edges.  Fig.  29  presents  the  plot  of  speed¬ 
up  achieved  by  using  the  AMR  algorithm  against  the  ratio  of  the  element  count  in  a  reference  sim¬ 
ulation  to  the  average  element  count  of  an  AMR  simulation.  If  there  were  no  AMR  overhead,  the 
simulation  with  element  ratio  of  10  would  yield  a  speed-up  of  10,  which  is  represented  by  the  solid 
black  line.  The  study  was  performed  for  three  different  time  integration  methods,  namely  the  explicit 
Runge-Kutta  3rd  order  5-stage  method,  and  two  implicit-explicit  second  order  methods  BDF2  and 
ARK2.  The  explicit  method  virtually  overlaps  with  the  ideal  speed-up  line,  while  the  IMEX  methods 
show  more  AMR  overhead.  The  sources  of  this  decreased  speed-up  are  studied  in  [184].  This  result 
shows,  however,  that  the  adaptive  mesh  algorithm  can  be  implemented  with  minimal  overhead. 

Table  3  shows  the  breakdown  of  time  spent  in  different  parts  of  the  code  that  was  run  on  a  single 
CPU  with  detailed  timings  for  different  parts  of  the  AMR  algorithm.  The  biggest  share  of  time  spent 
in  adapting  the  mesh  falls  on  the  criterion  evaluation,  even  for  the  very  simple  threshold  criterion 
used  in  this  study.  The  time  spent  on  evaluating  fluxes  on  non-conforming  interfaces  is  in  the  range  of 
2  —  3%  for  all  the  methods,  however  this  cannot  be  considered  an  overhead.  A  single  non-conforming 
interface  replaces  two  regular  faces,  therefore  the  time  spent  in  computing  those  fluxes  should  be 
compared  with  the  time  the  code  would  spent  on  computing  the  fluxes  on  the  faces  that  the  non- 
conforming  interface  replaces.  Additionally,  due  to  the  non-conforming  faces  we  are  able  to  reduce 
the  number  of  elements  in  the  domain  and  therefore  reduce  the  time  spent  on  the  volume  integration. 
The  important  message  is  that  the  computation  of  fluxes  on  non-conforming  interfaces  has  a  minor 
cost  compared  to  other  parts  of  the  code. 

Finally,  the  element-based  Galerkin  methods  preserve  their  original  mass  conservation  properties 
even  when  used  with  non-conforming  meshes.  This  is  discussed  in  detail  in  [185]  and  presented  Fig.  30. 


EBG  in  atmospheric  modeling 


61 


Fig.  29:  Performance  of  the  adaptive  mesh  algorithm  with  the  DG  method.  Speed-up  is  defined  as  the  ratio  of 
runtime  of  a  reference  simulation  with  a  fully  refined  mesh  to  the  runtime  of  the  adaptive  simulation.  The  element 
ratio  measures  the  ratio  of  element  count  of  the  fully  refined  simulation  to  the  average  element  count  of  the 
AMR  simulation.  Black  solid  line  represents  ideal  speed-up.  Results  are  reported  for  an  explicit  Runge-Kutta 
time  integration  (RK35)  and  two  implicit-explicit  schemes  (BDF2  and  ARK2).  The  simulation  was  set-up  as  a 
standard  density  current  test  case.  Source:  [184] 

Table  3:  Timings  (in  seconds  of  runtime)  and  percentage  breakdown  of  different  AMR  components  for  the  density 
current  case.  Source:  [184]. 


RK35 

BDF2 

ARK2 

total 

5292  (100%) 

643.6  (100%) 

967.8  (100%) 

volume  integrals 

2876.1  (54.3%) 

274.46  (42.6%) 

362.9  (37.5%) 

face  integrals 

507.67  (9.59%) 

66.59  (10.3%) 

80.1  (8.29%) 

non-conforming  faces 

116.79  (2.21%) 

21.33  (3.31%) 

23.65  (2.44%) 

AMR: 

6.030  (0.11%) 

6.291  (0.98%) 

6.025  (0.62%) 

criterion  evaluation 

3.463 

3.517 

3.527 

mesh  manipulation 

0.051 

0.05 

0.051 

data  projection 

0.965 

1.247 

0.965 

other 

1.412 

1.450 

1.454 

Both  CG  and  DG  methods  conserve  mass  to  machine  precision  even  for  very  long  simulation  times. 
In  the  density  current  case  presented  in  Fig.  30  the  typical  simulation  time  is  900s,  while  the  test  was 
performed  until  10,000s. 
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time 


Fig.  30:  Mass  conservation  as  a  function  of  simulation  time  for  a  dynamically  adaptive  mesh.  Triangular  markers 
denote  the  CG  results  and  circular  markers  represent  the  DG  results.  Colors  represent  different  polynomial  orders. 
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6  Summary 

After  briefly  reviewing  the  different  mathematical  models  of  the  atmosphere  (i.e.  equation  sets),  in 
this  article  we  have  concentrated  on  their  numerical  solution  by  means  of  element-based  Galerkin 
methods  (EBG).  The  emphasis  was  given  to  the  finite  and  spectral  element  methods  on  the  one 
hand  and  nodal  discontinuous  Galerkin  on  the  other.  Element-based  Galerkin  methods  appear  to 
be  perfectly  suited  for  building  next-generation  climate  and  Numerical  Weather  Prediction  (NWP) 
models  because  of  their  geometric  flexibility  (unstructured  and  AMR)  and  high  parallel-scalability  (on 
both  CPU  and  GPU-based  architectures;  see,  e.g.  [231]  and  citations  therein.)  There  are  issues  that 
remain  essentially  unsolved  (e.g.,  stabilization)  but,  as  we  have  shown  in  this  manuscript,  there  are 
numerous  options  that  appear  to  satisfy  the  requirements  for  building  robust,  efficient,  and  positivity¬ 
preserving  numerical  models.  Finally,  because  atmospheric  modeling  is  aiming  at  higher  and  higher 
resolution  simulations  and,  hence,  well  resolved  topography,  we  discussed  the  issue  of  grid  generation 
and  gave  some  insight  on  its  evolution  in  NWP. 
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